Counting Number of Users Whose Average is Greater than X in Postgres - postgresql

I am trying to find out the number of users who have scored an average of 80 or higher. I am using Having in my query but it is not returning the count of number of rows.
The Schema looks like:
Results
user
test_no
question_no
score
My Query:
SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80;
I get:
user
2
4
8
(3 rows)
Instead I would like to get 3 (number of rows) as the output. If I do count("user"), I get the count of number of tests for each user.
I understand this is related to use Group By but I need it for my Having clause. Any suggestions how I can do this is appreciated.
Update: Here is some sample data: http://pastebin.com/k1nH5Wzh (-1 means unanswered)
Thanks!

The query you found is good. Some minor simplifications:
SELECT count(*) AS ct
FROM (
SELECT 1
FROM result
WHERE score > 0
GROUP BY user_id
HAVING (sum(score) / count(DISTINCT test_no)) >= 80
) sub
DISTINCT does not require parentheses.
You can SELECT a constant value in the subquery. The value is irrelevant, since you are only going to count the rows. Slightly shorter and cheaper.
Don't use the reserved word user as column name. That's asking for trouble. I am using user_id instead.

I am not sure if this is an efficient way to do it but this seems to be working.
SELECT COUNT(*) FROM
(SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80)) q1;

Related

Postgres distinct query in two columns

I want to write a postgres query. For every distinct combination of (career-id and uid) I should return the entire row which has max time.
This is the sample data
id time career_id uid content
1 100 10000 5 Abc
2 300 6 7 xyz
3 200 10000 5 wxv
4 150 6 7 hgr
Ans:
id time career_id uid content
2 300 6 7 xyz
3 200 10000 5 wxv
this can be done using distinct on () in Postgres
select distinct on (career_id, uid) *
from the_table
order by career_id, uid, "time" desc;
You can use CTE's for this. Something like this should work:
WITH cte_max_value AS (
SELECT
career_id,
uid,
max("time") as max_time
FROM mytable
GROUP BY career_id, uid
)
SELECT DISTINCT t.*
FROM mytable AS t
INNER JOIN cte_max_value AS cmv
ON t.uid = cmv.uid AND t.career_id = cmv.career_id AND t.time = cmv.max_time
The CTE gives you all the unique combinations of career_id and uid with the relevant maximum time, the inner join then joins the entire rows into this. I'm using if you get two rows with the same maximum time for the same combination of career_id and uid you will get two rows returned.
If you don't want that you will need to find a strategy to resolve this.
Edit: Also the proposed solution by a_hrose_with_name's solution is far nicer and unless you need some level of compatibility with other servers (sadly syntax varies) you should use that instead.

PostgreSQL - Unexpected division by zero using SUM

This query (minimal reproducible example):
WITH t as (
SELECT 3 id, 2 price, 0 amount
)
SELECT
CASE WHEN amount > 0 THEN
SUM(price / amount)
ELSE
price
END u_price
FROM t
GROUP BY id, price, amount
on PostgreSQL 9.4 throws
division by zero
Without the SUM it works.
How is this possible?
I liked this question and I turned for help to these tough guys :
The planner is guilty:
A CASE cannot prevent evaluation of an aggregate expression contained
within it, because aggregate expressions are computed before other
expressions in a SELECT list or HAVING clause are considered
More details at https://www.postgresql.org/docs/10/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
I cannot figure the "why" part out, but here is a workaround...
WITH t as (
SELECT 3 id, 2 price, 0 amount
)
SELECT SUM(price / case when amount = 0 then 1 else amount end) u_cena
FROM t
GROUP BY id, price, amount
OR: you can use the following and avoid the "case"
SELECT SUM(price / power(amount,sign(amount))) u_cena
FROM t
GROUP BY id, price, amount

Order by two columns with different priority - Postgres

I want to be able to order users by two columns:
Number of followers they have
Number of the same following users that I am following - for similarity
Here's my query for now
SELECT COUNT(fSimilar.id) as similar_follow, COUNT(fCount.id) as followers_count, users.name FROM users
LEFT JOIN follows fSimilar ON fSimilar.user_id = users.id
AND fSimilar.following_id IN (
SELECT following_id FROM follows WHERE user_id = 1 // 1 is my user id
)
LEFT JOIN follows fCount ON fCount.following_id = users.id
WHERE users.name LIKE 'test%'
GROUP BY users.name
ORDER BY followers_count * 0.3 + similar_follow * 0.7 DESC
This selects people that follow the same people as me and also considers their popularity (amount of followers). This is similar to Instagram search.
I prioritise similar_follow by 70% or 0.7, and followers_count by 30%. However followers_count * 0.3 doesn't provide ordering integrity. For example some users have 1 - 10 million followers, this causes followers_count to be too large and similar_follow becomes too small to have any impact on ordering.
I have considered doing followers_count/500 where 500 is the average number of followers. However this still doesn't play well for ordering.
I need a way to equalise followers_count and similar_follow, so multiplication by percentages (0.3 and 0.7) makes a difference for both values.
I also looked at https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9#.wuz8j0f4w which describes Wilson score interval but I am not sure if this is the right solution in my case, as I deal with 2 values (I might be wrong).
Thank you.
I usually use LOG() when normalizing data that has a large range. Also, to reiterate #Abelisto, your attempt to weight each column doesn't work in your implementation. Adding the 2 together should work.
for example:
...
ORDER BY LOG(followers_count) * 0.3 + LOG(similar_follow) * 0.7 DESC
What about multiplying by exponents (ie. similar_follow ^ 3.0 and followers_count ^ 1.5)?
Reference: https://www.postgresql.org/docs/9.1/static/functions-math.html
Thanks #ForRealHomie, I implemented a query that works. I'm still opened for other
suggestions :)
SELECT
users.id, users.name,
fSimilar.count + fPopular.count as followCount
FROM users
LEFT JOIN usernames ON usernames.user_id=users.id
LEFT JOIN (
SELECT LOG(COUNT(user_id) + 1) * 0.7 as count, user_id
FROM follows
WHERE username_id IN (SELECT username_id FROM follows WHERE user_id=1)
GROUP BY user_id
) fSimilar ON fSimilar.user_id = users.id
LEFT JOIN (SELECT LOG(COUNT(username_id) + 1) * 0.3 as count, username_id
FROM follows
GROUP BY username_id
) fPopular ON fPopular.username_id = usernames.id
WHERE users.id IN (2, 3 ,4)
ORDER BY followCount DESC
NB: LOG(COUNT(...) + 1), + 1 is needed in order to accept 0 values that are generated by COUNT, because LOG doesn't accept 0 so + 1 fixes the issue :)

Firebird get the list with all available id

In a table I have records with id's 2,4,5,8. How can I receive a list with values 1,3,6,7. I have tried in this way
SELECT t1.id + 1
FROM table t1
WHERE NOT EXISTS (
SELECT *
FROM table t2
WHERE t2.id = t1.id + 1
)
but it's not working correctly. It doesn't bring all available positions.
Is it possible without another table?
You can get all the missing ID's from a recursive CTE, like this:
with recursive numbers as (
select 1 number
from rdb$database
union all
select number+1
from rdb$database
join numbers on numbers.number < 1024
)
select n.number
from numbers n
where not exists (select 1
from table t
where t.id = n.number)
the number < 1024 condition in my example limit the query to the max 1024 recursion depth. After that, the query will end with an error. If you need more than 1024 consecutive ID's you have either run the query multiple times adjusting the interval of numbers generated or think in a different query that produces consecutive numbers without reaching that level of recursion, which is not too difficult to write.

Get total count per ID change

how can I get a total count of sheets per change of sheet
example:
select sheetID,
..
from SomeTable
results look something like this:
sheetID
-----------
1000
1000
1000
1000
3000
3000
3000
so I want something like this:
select sheetID,
count(sheetID) as TotalsheetCount
from SomeTable
I just don't know how to break the count up per change of sheetID.
So I'd end up with this essentially:
sheetID TotalsheetCount
-------- -----------
1000 4
1000 4
1000 4
1000 4
3000 3
3000 3
3000 3
so 4 is because there are 4 1000s, 3 because there are 3 3000s. I am wanting to repeat the total count for that sheetID for each row, even though it's repeating, I want to provide that.
UPDATE, here's what I did per the replies but I'm getting way too many results now as compoared to the count where I did not add that partition count before
select MainTable.sheetID,
COUNT(SomeTable.sheetID)OVER(PARTITION BY SomeTable.sheetID) AS TotalSheetCount
table2.SomeField1,
table2.SomeField1
from MainTable
join (select distinct Sales.SalesKey from SomeLongTableName_Sales) sales on sales.SheetKey = MainTable.sheetKey
left outer join Site on MainTable.SiteKey = Site.SiteKey
join Calendar on sales.Date >= Calendar.StartDate
and sales.Date < Calendar.EndDate
group by SomeTable.sheetID
the joins and stuff is more realistic to my real query but formatted for this post to hide real field and table names.
You probably want to use a GROUP BY:
SELECT sheetID, COUNT(sheetID) AS TotalsheetCount
FROM dbo.SomeTable
GROUP BY sheetID
I am wanting to repeat the total count for that sheetID for each row,
even though it's repeating, I want to provide that
If you're using at least SQL-Server 2005, you can use a CTE with COUNT + OVER-clause, otherwise use a sub-query:
WITH CTE AS
(
SELECT sheetID,
COUNT(sheetID)OVER(PARTITION BY sheetID) AS TotalsheetCount
FROM SomeTable
)
SELECT sheetID, TotalsheetCount FROM CTE
Use the GROUP BY clause in a subquery to select the counts:
SELECT sheetID,
count(sheetID) as TotalsheetCount
FROM SomeTable
GROUP BY sheetID
This would make your whole query look like this:
SELECT t.sheetID,
counts.TotalsheetCount
FROM SomeTable t,
(SELECT sheetID, count(sheetID) as TotalsheetCount FROM SomeTable GROUP BY sheetID) counts
WHERE t.sheetID = counts.sheetID
It looks like you need a group-by expression:
select sheetID,
count(*) as TotalsheetCount
from SomeTable
group by sheetID
Is that it?
DC