Order by two columns with different priority - Postgres - postgresql

I want to be able to order users by two columns:
Number of followers they have
Number of the same following users that I am following - for similarity
Here's my query for now
SELECT COUNT(fSimilar.id) as similar_follow, COUNT(fCount.id) as followers_count, users.name FROM users
LEFT JOIN follows fSimilar ON fSimilar.user_id = users.id
AND fSimilar.following_id IN (
SELECT following_id FROM follows WHERE user_id = 1 // 1 is my user id
)
LEFT JOIN follows fCount ON fCount.following_id = users.id
WHERE users.name LIKE 'test%'
GROUP BY users.name
ORDER BY followers_count * 0.3 + similar_follow * 0.7 DESC
This selects people that follow the same people as me and also considers their popularity (amount of followers). This is similar to Instagram search.
I prioritise similar_follow by 70% or 0.7, and followers_count by 30%. However followers_count * 0.3 doesn't provide ordering integrity. For example some users have 1 - 10 million followers, this causes followers_count to be too large and similar_follow becomes too small to have any impact on ordering.
I have considered doing followers_count/500 where 500 is the average number of followers. However this still doesn't play well for ordering.
I need a way to equalise followers_count and similar_follow, so multiplication by percentages (0.3 and 0.7) makes a difference for both values.
I also looked at https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9#.wuz8j0f4w which describes Wilson score interval but I am not sure if this is the right solution in my case, as I deal with 2 values (I might be wrong).
Thank you.

I usually use LOG() when normalizing data that has a large range. Also, to reiterate #Abelisto, your attempt to weight each column doesn't work in your implementation. Adding the 2 together should work.
for example:
...
ORDER BY LOG(followers_count) * 0.3 + LOG(similar_follow) * 0.7 DESC

What about multiplying by exponents (ie. similar_follow ^ 3.0 and followers_count ^ 1.5)?
Reference: https://www.postgresql.org/docs/9.1/static/functions-math.html

Thanks #ForRealHomie, I implemented a query that works. I'm still opened for other
suggestions :)
SELECT
users.id, users.name,
fSimilar.count + fPopular.count as followCount
FROM users
LEFT JOIN usernames ON usernames.user_id=users.id
LEFT JOIN (
SELECT LOG(COUNT(user_id) + 1) * 0.7 as count, user_id
FROM follows
WHERE username_id IN (SELECT username_id FROM follows WHERE user_id=1)
GROUP BY user_id
) fSimilar ON fSimilar.user_id = users.id
LEFT JOIN (SELECT LOG(COUNT(username_id) + 1) * 0.3 as count, username_id
FROM follows
GROUP BY username_id
) fPopular ON fPopular.username_id = usernames.id
WHERE users.id IN (2, 3 ,4)
ORDER BY followCount DESC
NB: LOG(COUNT(...) + 1), + 1 is needed in order to accept 0 values that are generated by COUNT, because LOG doesn't accept 0 so + 1 fixes the issue :)

Related

Postgresql Use a specified value instead of first_value

I am trying to replace first_value and use a specified value to use in an equation because of alpha sort I'm having an issue. I want to use a value in a row called 'control' which is in the segment column and direct_mail_test table. So I need to find a way to call just that value ('control') to then use in the equation. New to PostgreSQL, any help would be greatly appreciated.
Here is my current code
select segment,
count(*) as total,
count(b.c_guid) as bookings,
100.0 * count(b.c_guid)/count(*) as percent,
100.0 * count(b.c_guid) / first_value(count(b.c_guid)) over ( order by segment asc ) as comp
from mailing_tests
left join (
select distinct g.contact_guid as c_guid
from guest g
inner join booking
on booking.guid = g.booking_guid
where booking.book_date >= {{date_start}}
[[and booking.book_date < {{date_end}}]]
and booking.status in ('Booked')
) b
on mailing_tests.guid = b.c_guid
where {{project}}
group by segment
order by segment asc
Here is my output:
segment
total bookings percent comp
catalog 4,091 30 0.73 100
control 30,611 359 1.17 1,196.67
direct_mail 30,611 393 1.28 1,310
online_ads 30,611 371 1.21 1,236.67
'''
As of now its taking in the 'catalog' as the
measurable and I need it to take the 'control' as the
measurable.
Just to put some more context on the code, I am using
metabase. So the {{date_start}}
[[and peak15_booking.book_date < {{date_end}}]]
{{project}}
are all variable functions used in metabase.
I tried to use the nth_value, fetch function, and
many others, but I'm not sure if I am even using
those properly. I have been unsuccessful in finding
any answer to this.
I found a way to make this work, but not sure if it's the cleanest way to do this or if I will run into future issues. I still haven't figured out how to replace first_value, but I did add (order by segment='control' desc) twice. once in the SELECT statement and the other in WHERE clause. Again, not sure if this is the correct way to do this, but thought I would show that I did end up figuring it out. Not sure if I should of answered my own question or deleted it, but thought that this might help someone else.
'
select segment,
count(*) as total,
count(b.c_guid) as bookings,
100.0 * count(b.c_guid) / count(*) as percent,
100.0 * count(b.c_guid) / first_value(count(b.c_guid)) over ( order by segment='control' desc, segment asc ) as comp
from mailing_tests
left join (
select distinct g.contact_guid as c_guid
from guest g
inner join booking
on booking.guid = g.booking_guid
where booking.book_date >= {{date_start}}
[[and booking.book_date < {{date_end}}]]
and booking.status in ('Booked')
) b
on mailing_tests.guid = b.c_guid
where {{project}}
group by segment
order by segment='control' desc, segment asc
'

Postgresql SUM calculated column

I am trying to create some sql to calculate the worth of a users inventory and have manage to get it to work up to the final step.
SELECT DISTINCT ON (pricing_cards.card_id)
(inventory_cards.nonfoil * pricing_cards.nonfoil) + (inventory_cards.foil * pricing_cards.foil) as x
FROM inventory_cards
INNER JOIN pricing_cards ON pricing_cards.card_id = inventory_cards.card_id
WHERE inventory_cards.user_id = 1
ORDER BY pricing_cards.card_id, pricing_cards.date DESC;
The code above bring back a single column that has the correct calculation for card. I now need to sum this column together but keep getting errors when I try to sum it.
Adding SUM((inventory_cards.nonfoil * pricing_cards.nonfoil) + (inventory_cards.foil * pricing_cards.foil)) throws the following error
ERROR: column "pricing_cards.card_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 6: ORDER BY pricing_cards.card_id, pricing_cards.date DESC;
Adding GROUP BY pricing_cards.card_id, pricing_cards.date seems to fix the errors but is returning the same column of calculated values.
so:
SELECT DISTINCT ON (pricing_cards.card_id)
SUM((inventory_cards.nonfoil * pricing_cards.nonfoil) + (inventory_cards.foil * pricing_cards.foil)) as x
FROM inventory_cards
INNER JOIN pricing_cards ON pricing_cards.card_id = inventory_cards.card_id
WHERE inventory_cards.user_id = 1
GROUP BY pricing_cards.card_id, pricing_cards.date
ORDER BY pricing_cards.card_id, pricing_cards.date DESC;
Returns:
x
0.71
29.92
25.67
171.20
0.32
0.26
I suggest you use a subquery to get the latest pricing data, then join and sum:
SELECT
SUM(inventory_cards.nonfoil * latest_pricing.nonfoil + inventory_cards.foil * latest_pricing.foil)
FROM inventory_cards
INNER JOIN (
SELECT DISTINCT ON (card_id)
card_id, nonfoil, foild
FROM pricing_cards
ORDER BY pricing_cards.card_id, pricing_cards.date DESC
) AS latest_pricing USING (card_id)
WHERE inventory_cards.user_id = 1
For alternatives in the subquery, see also Select first row in each GROUP BY group? and Optimize GROUP BY query to retrieve latest row per user.

Counting Number of Users Whose Average is Greater than X in Postgres

I am trying to find out the number of users who have scored an average of 80 or higher. I am using Having in my query but it is not returning the count of number of rows.
The Schema looks like:
Results
user
test_no
question_no
score
My Query:
SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80;
I get:
user
2
4
8
(3 rows)
Instead I would like to get 3 (number of rows) as the output. If I do count("user"), I get the count of number of tests for each user.
I understand this is related to use Group By but I need it for my Having clause. Any suggestions how I can do this is appreciated.
Update: Here is some sample data: http://pastebin.com/k1nH5Wzh (-1 means unanswered)
Thanks!
The query you found is good. Some minor simplifications:
SELECT count(*) AS ct
FROM (
SELECT 1
FROM result
WHERE score > 0
GROUP BY user_id
HAVING (sum(score) / count(DISTINCT test_no)) >= 80
) sub
DISTINCT does not require parentheses.
You can SELECT a constant value in the subquery. The value is irrelevant, since you are only going to count the rows. Slightly shorter and cheaper.
Don't use the reserved word user as column name. That's asking for trouble. I am using user_id instead.
I am not sure if this is an efficient way to do it but this seems to be working.
SELECT COUNT(*) FROM
(SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80)) q1;

SSRS 2005 column chart: show series label missing when data count is zero

I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId

Firebird get the list with all available id

In a table I have records with id's 2,4,5,8. How can I receive a list with values 1,3,6,7. I have tried in this way
SELECT t1.id + 1
FROM table t1
WHERE NOT EXISTS (
SELECT *
FROM table t2
WHERE t2.id = t1.id + 1
)
but it's not working correctly. It doesn't bring all available positions.
Is it possible without another table?
You can get all the missing ID's from a recursive CTE, like this:
with recursive numbers as (
select 1 number
from rdb$database
union all
select number+1
from rdb$database
join numbers on numbers.number < 1024
)
select n.number
from numbers n
where not exists (select 1
from table t
where t.id = n.number)
the number < 1024 condition in my example limit the query to the max 1024 recursion depth. After that, the query will end with an error. If you need more than 1024 consecutive ID's you have either run the query multiple times adjusting the interval of numbers generated or think in a different query that produces consecutive numbers without reaching that level of recursion, which is not too difficult to write.