Calculate won, tie and lost games in postgresql - postgresql

I have two tables "matches" and "opponents".
Matches
id | date
---+------------
1 | 2016-03-21 21:00:00
2 | 2016-03-22 09:00:00
...
Opponents
(score is null if not played)
id | match_id | team_id | score
---+----------+---------+------------
1 | 1 | 1 | 0
2 | 1 | 2 | 1
3 | 2 | 3 | 1
4 | 2 | 4 | 1
4 | 3 | 1 |
4 | 3 | 2 |
....
The goal is to create the following table
Team | won | tie | lost | total
-----+-----+-----+------+----------
2 | 1 | 0 | 0 | 1
3 | 0 | 1 | 0 | 1
4 | 0 | 1 | 0 | 1
1 | 0 | 0 | 1 | 1
Postgres v9.5
How do I do this? (Im open to maybe moving the "score" to somewhere else in my model if it makes sense.)

Divide et impera my son
with teams as (
select distinct team_id from opponents
),
teamgames as (
select t.team_id, o.match_id, o.score as team_score, oo.score as opponent_score
from teams t
join opponents o on t.team_id = o.team_id
join opponents oo on (oo.match_id = o.match_id and oo.id != o.id)
),
rankgames as (
select
team_id,
case
when team_score > opponent_score then 1
else 0
end as win,
case
when team_score = opponent_score then 1
else 0
end as tie,
case
when team_score < opponent_score then 1
else 0
end as loss
from teamgames
),
rank as (
select
team_id, sum(win) as win, sum(tie) as tie, sum(loss) as loss,
sum( win * 3 + tie * 1 ) as score
from rankgames
group by team_id
order by score desc
)
select * from rank;
Note1: You probably don't need the first "with" as you probably have already a table with one record per team
Note2: i think you can also achieve the same result with one single query, but in this way steps are clearer

Related

Filter a sum of values until a certain threshold is reached

DbFiddle
Stuck. Need SO :)
Consider the following distribution of values.
ID CNT SEC SHOW(Bool)
1 10 1
2 1 1
3 25 1
4 1 1
5 2 1
6 10 1
7 50 2
8 90 2
My goal is to filter by sec and then
sort by cnt ascending,
sort by id ascending
and then flag/filter all rows as show - false where cnt is < 5 and until the sum of cnt of all hidden rows (show=false) is >= 5.
So the sum of all "hidden" rows may never be < 5.
Expected outcome for sec=1:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 2 | 1 | 1 | false |
| 4 | 1 | 2 | false |
| 5 | 2 | 4 | false |
| 1 | 10 | 14 | false | -- The sum of all hidden rows before this point is 4
| 6 | 10 | 24 | true | -- The total of all hidden rows is now >= 5.
| 3 | 25 | 49 | true |
Expected outcome for sec=2:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 7 | 50 | 50 | true |
| 8 | 90 | 140 | true |
I can already sort the values and create the sums etc. I have not figured out, how to determine how to set the cutoff point, when "hidding" is not necessary.
I am already doing this in "client code" and I want to migrate it to sql.
Here LAG() will help to achieve what you want. You can write your query like below:
with cte as (
SELECT
id, cnt, sec,
sum(cnt) over (partition by sec order by cnt,id) sum_
FROM
tbl )
select
id, cnt, sum_,
case
when sum_<5 or lag(sum_) over (partition by sec order by cnt,id) <5 then 'false'
else
'true'
end as "show"
from cte
DEMO

Generate a histogram of values grouped by a column

I have the following data in a reviews table for certain set of items, using a score system that ranges from 0 to 100
+-----------+---------+-------+
| review_id | item_id | score |
+-----------+---------+-------+
| 1 | 1 | 90 |
+-----------+---------+-------+
| 2 | 1 | 40 |
+-----------+---------+-------+
| 3 | 1 | 10 |
+-----------+---------+-------+
| 4 | 2 | 90 |
+-----------+---------+-------+
| 5 | 2 | 90 |
+-----------+---------+-------+
| 6 | 2 | 70 |
+-----------+---------+-------+
| 7 | 3 | 80 |
+-----------+---------+-------+
| 8 | 3 | 80 |
+-----------+---------+-------+
| 9 | 3 | 80 |
+-----------+---------+-------+
| 10 | 3 | 80 |
+-----------+---------+-------+
| 11 | 4 | 10 |
+-----------+---------+-------+
| 12 | 4 | 30 |
+-----------+---------+-------+
| 13 | 4 | 50 |
+-----------+---------+-------+
| 14 | 4 | 80 |
+-----------+---------+-------+
I am trying to create a histogram of the score values with a bin size of five. My goal is to generate a histogram per item. In order to create a histogram of the entire table, it is possible to use the width_bucket. This can also be tuned to operate on a per-item basis:
SELECT item_id, g.n as bucket, COUNT(m.score) as count
FROM generate_series(1, 5) g(n) LEFT JOIN
review as m
ON width_bucket(score, 0, 100, 4) = g.n
GROUP BY item_id, g.n
ORDER BY item_id, g.n;
However, the result looks like this:
+---------+--------+-------+
| item_id | bucket | count |
+---------+--------+-------+
| 1 | 5 | 1 |
+---------+--------+-------+
| 1 | 3 | 1 |
+---------+--------+-------+
| 1 | 1 | 1 |
+---------+--------+-------+
| 2 | 5 | 2 |
+---------+--------+-------+
| 2 | 4 | 2 |
+---------+--------+-------+
| 3 | 4 | 4 |
+---------+--------+-------+
| 4 | 1 | 1 |
+---------+--------+-------+
| 4 | 2 | 1 |
+---------+--------+-------+
| 4 | 3 | 1 |
+---------+--------+-------+
| 4 | 4 | 1 |
+---------+--------+-------+
That is, bins with no entries are not included. While I find this not to be a bad solution, I would rather have either all buckets, with 0 on those with no entries. Even better, using this structure:
+---------+----------+----------+----------+----------+----------+
| item_id | bucket_1 | bucket_2 | bucket_3 | bucket_4 | bucket_5 |
+---------+----------+----------+----------+----------+----------+
| 1 | 1 | 0 | 1 | 0 | 1 |
+---------+----------+----------+----------+----------+----------+
| 2 | 0 | 0 | 0 | 2 | 2 |
+---------+----------+----------+----------+----------+----------+
| 3 | 0 | 0 | 0 | 4 | 0 |
+---------+----------+----------+----------+----------+----------+
| 4 | 1 | 1 | 1 | 1 | 0 |
+---------+----------+----------+----------+----------+----------+
I prefer this solution as it uses a row per item (instead of 5n), which is simpler to query and minimizes memory consumption and data transfer costs. My current approach is as follows:
select item_id,
(sum(case when score >= 0 and score <= 19 then 1 else 0 end)) as bucket_1,
(sum(case when score >= 20 and score <= 39 then 1 else 0 end)) as bucket_2,
(sum(case when score >= 40 and score <= 59 then 1 else 0 end)) as bucket_3,
(sum(case when score >= 60 and score <= 79 then 1 else 0 end)) as bucket_4,
(sum(case when score >= 80 and score <= 100 then 1 else 0 end)) as bucket_5
from review;
Even though this query satisfies my requirements, I am curious to see if there might be a more elegant approach. so many case statements are not easy to read and changes in the bin criteria might require updating every sum. Also I am curious about the potential performance concerns that this query might have.
The second query can be rewritten to use ranges to make editing and writing the query a bit easier:
with buckets (b1, b2, b3, b4, b5) as (
values (
int4range(0, 20), int4range(20, 40), int4range(40, 60), int4range(60, 80), int4range(80, 100)
)
)
select item_id,
count(*) filter (where b1 #> score) as bucket_1,
count(*) filter (where b2 #> score) as bucket_2,
count(*) filter (where b3 #> score) as bucket_3,
count(*) filter (where b4 #> score) as bucket_4,
count(*) filter (where b5 #> score) as bucket_5
from review
cross join buckets
group by item_id
order by item_id;
A range constructed with int4range(0,20) includes the lower end and excludes the upper end.
The CTE named buckets only creates a single row, so the cross join does not change the number of rows from the review table.
I found this post useful
CREATE FUNCTION temp_histogram(table_name_or_subquery text, column_name text)
RETURNS TABLE(bucket int, "range" numrange, freq bigint, bar text)
AS $func$
BEGIN
RETURN QUERY EXECUTE format('
WITH
source AS (
SELECT * FROM %s
),
min_max AS (
SELECT min(%s) AS min, max(%s) AS max FROM source
),
temp_histogram AS (
SELECT
width_bucket(%s, min_max.min, min_max.max, 100) AS bucket,
numrange(min(%s)::numeric, max(%s)::numeric, ''[]'') AS "range",
count(%s) AS freq
FROM source, min_max
WHERE %s IS NOT NULL
GROUP BY bucket
ORDER BY bucket
)
SELECT
bucket,
"range",
freq::bigint,
repeat(''*'', (freq::float / (max(freq) over() + 1) * 15)::int) AS bar
FROM temp_histogram',
table_name_or_subquery,
column_name,
column_name,
column_name,
column_name,
column_name,
column_name,
column_name
);
END
$func$ LANGUAGE plpgsql;
Use the bucket numbers(100 in above script) in your favour.
Invoke like this
SELECT * FROM histogram($table_name_or_subquery, $column_name);
Example:
SELECT * FROM histogram('transactions_tbl', 'amount_colm');

Postgresql: Select sum with different conditions

I have two table table:
I. Table 1 like this:
------------------------------------------
codeid | pos | neg | category
-----------------------------------------
1 | 10 | 3 | begin2016
1 | 3 | 5 | justhere
3 | 7 | 7 | justthere
4 | 1 | 1 | else
4 | 12 | 0 | begin2015
4 | 5 | 12 | begin2013
1 | 2 | 50 | now
2 | 5 | 33 | now
5 | 33 | 0 | Begin2011
5 | 11 | 7 | begin2000
II. Table 2 like this:
------------------------------------------
codeid | codedesc | codegroupid
-----------------------------------------
1 | road runner | 1
2 | bike warrior | 2
3 | lazy driver | 4
4 | clever runner | 1
5 | worker | 3
6 | smarty | 1
7 | sweety | 3
8 | sweeper | 1
I want to have one result like this having two (or more) conditions:
sum pos and neg where codegroupid IN('1', '2', '3')
BUt do not sum pos and neg if category like 'begin%'
So the result will like this:
------------------------------------------
codeid | codedesc | sumpos | sumneg
-----------------------------------------
1 | roadrunner | 5 | 55 => (sumpos = 3+2, because 10 have category like 'begin%' so doesn't sum)
2 | bike warrior | 5 | 33
4 | clever runner | 1 | 1
5 | worker | 0 | 0 => (sumpos=sumneg=0) becase codeid 5 category ilike 'begin%'
Group by codeid, codedesc;
Sumpos is sum(pos) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all pos values become zero (0);
Sumpos is sum(neg) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all neg values become zero;
Any ideas how to do it?
Try:
SELECT
b.codeid,
b.codedesc,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.pos END) AS sumpos,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.neg END) AS sumneg
FROM
table1 AS a
JOIN
table2 AS b ON a.codeid = b.codeid
WHERE b.codegroupid IN (1, 2, 3)
GROUP BY
b.codeid,
b.codedesc;

PostgreSQL Query to get pivot result

I have a log table look like this
rpt_id | shipping_id | shop_id | status | create_time
-------------------------------------------------------------
1 | 1 | 600 | 1 | 2013-12-01 01:06:50
2 | 1 | 600 | 0 | 2013-12-01 01:06:55
3 | 1 | 600 | 1 | 2013-12-02 10:00:30
4 | 2 | 600 | 1 | 2013-12-02 10:00:30
5 | 1 | 601 | 1 | 2013-12-02 11:20:10
6 | 2 | 601 | 1 | 2013-12-02 11:20:10
7 | 1 | 601 | 0 | 2013-12-03 09:10:10
8 | 3 | 602 | 1 | 2013-12-03 13:15:58
And I want to use single query to make it look like this
shipping_id | total_activate | total_deactivate
-----------------------------------------------
1 | 2 | 2
2 | 2 | 0
3 | 1 | 0
How should I query this?
Note:
Status = 1 = Activate
Status = 0 = Deactivate
Count total activate / deactivate rule: look at log table above. rpt_id 1 & 3, it has same shop_id, shipping_id and status. It should only count as one. See the result table. Shipping id 1 is only activated by 2 shops, they are shop_id 600 and 601.
Can you guys advice me how to make the query? thanks for the help:D
Try this:
select shipping_id,
sum(case when status=1 then 1 else 0 end) as total_activate,
sum(case when status=0 then 1 else 0 end) as total_deactivate
from (select distinct shipping_id,
shop_id,
status
from test) a
group by shipping_id
order by shipping_id
See it here at fiddle: http://sqlfiddle.com/#!15/f15fd/4
I did not put the date on the query as it is not important for the result.
Yes thanks... I also figured it out already, you can do it this way too.... thx
SELECT
shipping_id,
COUNT(DISTINCT CASE WHEN status = 1 THEN shop_id END) AS total_activate,
COUNT(DISTINCT CASE WHEN status = 0 THEN shop_id END) AS total_deactivate
FROM
test
GROUP BY
shipping_id
ORDER BY
shipping_id

count() corresponding to max() of different values satisfying some condition

I have the following tables:
user_group
usergrp_id bigint Primary Key
usergrp_name text
user
user_id bigint Primary Key
user_name text
user_usergrp_id bigint
user_loc_id bigint
user_usergrp_id has its corresponding id from the user_group table
user_loc_id has its corresponding id(branch_id) from the branch table.
branch
branch_id bigint Primary Key
branch_name text
branch_type smallint
branch_type By default is set as 1. Although it may contain any value in between 1 and 4.
user_projects
proj_id bigint Primary Key
proj_name text
proj_branch_id smallint
proj_branch_id has its corresponding id(branch_id) from the branch table.
user_approval
appr_id bigint Primary Key
appr_prjt_id bigint
appr_status smallint
appr_approval_by bigint
appr_approval_by has its corresponding id(user_id) from the user table
appr_status may contain different status values like 10,20,30... for a single appr_prjt_id
user_group
usergrp_id | usergrp_name
-------------------------
1 | Admin
2 | Manager
user
user_id | user_name | user_usergrp_id |user_loc_id
---------------------------------------------------
1 | John | 1 | 1
2 | Harry | 2 | 1
branch
branch_id | branch_name | branch_type
-------------------------------------
1 | location1 | 2
2 | location2 | 1
3 | location3 | 4
4 | location4 | 2
5 | location4 | 2
user_projects
proj_id | proj_name | proj_branch_id
------------------------------------
1 | test1 | 1
2 | test2 | 2
3 | test3 | 1
4 | test4 | 3
5 | test5 | 1
6 | test5 | 4
user_approval
appr_id | appr_prjt_id | appr_status | appr_approval_by
-------------------------------------------------------
1 | 1 | 10 | 1
2 | 1 | 20 | 1
3 | 1 | 30 | 1
4 | 2 | 10 | 2
5 | 3 | 10 | 1
6 | 3 | 20 | 2
7 | 4 | 10 | 1
8 | 4 | 20 | 1
Condition: The output must take the MAX() value of appr_status for each appr_prjt_id and count it.
I.e., in the above table appr_prjt_id=1 has 3 different status: 10, 20, 30. Its count must only be shown for status corresponding to 30 in the output (not in the statuses 10 and 20), corresponding to a user group in a particular branch_name. Similarly for each of the other id's in the field appr_prjt_id
SQL Fiddle
Desired Output:
10 | 20 | 30
------> Admin 0 | 1 | 1
|
location1
|
------> Manager 1 | 1 | 0
How can I do that?
SQL Fiddle
SQL Fiddle
select
branch_name, usergrp_name,
sum((appr_status = 10)::integer) "10",
sum((appr_status = 20)::integer) "20",
sum((appr_status = 30)::integer) "30"
from
(
select distinct on (appr_prjt_id)
appr_prjt_id, appr_approval_by, appr_status
from user_approval
order by 1, 3 desc
) ua
inner join
users u on ua.appr_approval_by = u.user_id
inner join
user_group ug on u.user_usergrp_id = ug.usergrp_id
inner join
branch b on u.user_loc_id = b.branch_id
group by branch_name, usergrp_name
order by usergrp_name
The classic solution, that works in most DBMSs is to use a case:
select
branch_name, usergrp_name,
sum(case appr_status when 10 then 1 else 0 end) "10",
But Postgresql has the boolean type and it has a cast to integer (boolean::integer) resulting in 0 or 1 which makes for less verbose code.
In this case it is also possible to do a count in instead of a sum:
select
branch_name, usergrp_name,
count(appr_status = 10 or null) "10",
I indeed prefer the count but I have the impression that it is harder to understand. The trick is to know that count counts anything not null and that a (true or null) is true and a (false or null) is null so it will count whenever the condition is true.