PostgreSQL group by and count on specific condition - postgresql

I have the following tables (example)
Analyze_Line
id
game_id
bet_result
game_type
1
1
WIN
0
2
2
LOSE
0
3
3
WIN
0
4
4
LOSE
0
5
5
LOSE
0
6
6
WIN
0
Game
id
league_id
home_team_id
away_team_id
1
1
1
2
2
2
2
3
3
3
3
4
4
1
1
2
5
2
2
3
6
3
3
4
Required Data:
league_id
WIN
LOSE
GameCnt
1
1
1
2
2
0
2
2
3
2
0
2
The Analyze_Line table is joined with the Game table and simple can get GameCnt grouping by league_id, but I am not sure how to calculate WIN count and LOSE count in bet_result

You can use conditionals in aggregate function to divide win and lose bet results per league.
select
g.league_id,
sum(case when a.bet_result = 'WIN' then 1 end) as win,
sum(case when a.bet_result = 'LOSE' then 1 end) as lose,
count(*) as gamecnt
from
game g
inner join analyze_line a on
g.id = a.game_id
group by
g.league_id
Since there is no mention of postgresql version, I can't recommend using FILTER clause (postgres specific), since it might not work for you.

Adding to Kamil's answer - PostgreSQL introduced the filter clause in PostgreSQL 9.4, released about eight years ago (December 2014). At this point, I think it's safe enough to use in answers. IMHO, it's a tad more elegant than summing over a case expression, but it does have the drawback of being PostgreSQL specific syntax, and thus not portable:
SELECT g.league_id,
COUNT(*) FILTER (WHERE a.bet_result = 'WIN') AS win,
COUNT(*) FILTER (WHERE a.bet_result = 'LOSE') AS lose,
COUNT(*) AS gamecnt
FROM game g
JOIN analyze_line a ON g.id = a.game_id
GROUP BY g.league_id

Related

SQL Select based on each row of previous select

I have a table with answers regarding different questions, all of them numbered. There are basically these columns: IdAnswer (unique for each answer in the table), IdUser (which won't repeat even if the same user answer questions a second time), IdQuestion and Answer.
IdAnswer IdUser IdQuestion Answer
1 John 1 0
2 John 4 1
3 John 5 1
4 John 6 0
5 Bob 1 1
6 Bob 3 1
7 Bob 5 0
8 Mark 2 0
9 Mark 7 1
10 Mark 5 0
I'd like to select from this table all answers to a specific question (say, IdQuestion = 5), and also the last question each user answered just before question number 5.
In the end I need a table that should look like this:
IdAnswer IdUser IdQuestion Answer
2 John 4 1
3 John 5 1
6 Bob 3 1
7 Bob 5 0
9 Mark 7 1
10 Mark 5 0
I've managed to make this work using a cursor to iterate through each line from the first SELECT result (which filters by IdQuestion), but I'm not sure if this is the best (and fastest) way of doing it. Is there any more efficient way of achieving the same result?
And by the way, I'm using SQL Server Management Studio 2012.
Here is one way using LEAD function
select * from
(
select *,NextQ = Lead(IdQuestion)over(partition by IdUser order by IdAnswer)
from youtable
) a
Where 5 in (IdQuestion, NextQ )
for older versions
;WITH cte
AS (SELECT prev_id = Min(CASE WHEN IdQuestion = 5 THEN rn - 1 END) OVER( partition BY IdUser),*
FROM (SELECT rn = Row_number()OVER(partition BY IdUser ORDER BY IdAnswer),*
FROM Yourtable)a)
SELECT *
FROM cte
WHERE rn IN ( prev_id, prev_id + 1 )

Find last occurring value within record in PostgreSQL

I'm not new to SQL, but I am new to PostgreSQL and am really struggling to adapt my current knowledge in a different environment.
I am trying to create a variable that captures whether or not someone stays active, skips, or churns within a 0/1 time series variable. For example, in the data below, my dataset would include the variables id,time, and voted, and I would create the variable "skipped":
id time voted skipped
1 1 1 active
1 2 0 skipped
1 3 1 active
2 1 1 active
2 2 0 churned
2 3 0 churned
3 1 1 active
3 2 1 active
3 3 0 churned
The rule for coding "skipped" is pretty simple: If 1 is the last record, the person is "active" and any zeroes count as "skipped", but if 0 is the last record, the person is "churned".
The record with id = 1 is a skip because id is non-zero at time 3 after being 0 at time 2. The other two cases, 0 is the final value so they are "churned". Can anyone help? I've been noodling on it all day, and am hitting a wall.
This isn't particularly elegant, but it should meet your needs:
with votes as (
select
id, time, voted,
max(time) over (partition by id) as max_time
from voter_data
)
select
v1.id, v1.time, v1.voted,
case
when v1.voted = 1 then 'active'
when v2.voted = 1 then 'skipped'
else 'churned'
end as skipped
from
votes v1
join votes v2 on
v1.id = v2.id and
v1.max_time = v2.time
In a nutshell, we first figure out which is the last record for each voter id, and then we do a self-join on the resulting table to isolate only that last id.
There is a chance this could produce multiple results -- if it's possible to have the same ID vote twice at the same time. If that's the case, you want row_number() instead of max().
Results on your data:
1 1 1 'active'
1 2 0 'skipped'
1 3 1 'active'
2 1 1 'active'
2 2 0 'churned'
2 3 0 'churned'
3 1 1 'active'
3 2 1 'active'
3 3 0 'churned'
Window functions can help for readability when working with self-referential joins.
WITH
add_last_voted_status AS (
SELECT
*
, LAST_VALUE(voted) OVER (
PARTITION BY id
ORDER BY time
) AS last_voted_status
FROM table
)
SELECT
id
, time
, voted
, CASE
WHEN last_voted_status = 0
THEN 'churned'
WHEN last_voted_status = 1 AND voted = 1
THEN 'active'
WHEN last_voted_status = 1 AND voted = 0
THEN 'skipped'
ELSE '?'
END AS skipped
FROM add_last_voted_status

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

How to optimize query

I have the same problem as mentioned in In SQL, how to select the top 2 rows for each group. The answer is working fine. But it takes too much time. How to optimize this query?
Example:
sample_table
act_id: act_cnt:
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 4
a 4
b 4
c 4
d 4
e 4
Now i want to group it (or using some other ways). And i want to select 2 rows from each group. Sample Output:
act_id: act_cnt:
1 1
2 1
6 3
7 3
9 4
a 4
I am new to SQL. How to do it?
The answer you linked to uses an inefficient workaround for MySQL's lack of window functions.
Using a window function is most probably much faster as you only need to read the table once:
select name,
score
from (
select name,
score,
dense_rank() over (partition by name order by score desc) as rnk
from the_table
) t
where rnk <= 2;
SQLFiddle: http://sqlfiddle.com/#!15/b0198/1
Having an index on (name, score) should speed up this query.
Edit after the question (and the problem) has been changed
select act_id,
act_cnt
from (
select act_id,
act_cnt,
row_number() over (partition by act_cnt order by act_id) as rn
from sample_table
) t
where rn <= 2;
New SQLFiddle: http://sqlfiddle.com/#!15/fc44b/1

tsql sum data and include default values for missing data

I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().