PostgreSQL Query to get pivot result - postgresql

I have a log table look like this
rpt_id | shipping_id | shop_id | status | create_time
-------------------------------------------------------------
1 | 1 | 600 | 1 | 2013-12-01 01:06:50
2 | 1 | 600 | 0 | 2013-12-01 01:06:55
3 | 1 | 600 | 1 | 2013-12-02 10:00:30
4 | 2 | 600 | 1 | 2013-12-02 10:00:30
5 | 1 | 601 | 1 | 2013-12-02 11:20:10
6 | 2 | 601 | 1 | 2013-12-02 11:20:10
7 | 1 | 601 | 0 | 2013-12-03 09:10:10
8 | 3 | 602 | 1 | 2013-12-03 13:15:58
And I want to use single query to make it look like this
shipping_id | total_activate | total_deactivate
-----------------------------------------------
1 | 2 | 2
2 | 2 | 0
3 | 1 | 0
How should I query this?
Note:
Status = 1 = Activate
Status = 0 = Deactivate
Count total activate / deactivate rule: look at log table above. rpt_id 1 & 3, it has same shop_id, shipping_id and status. It should only count as one. See the result table. Shipping id 1 is only activated by 2 shops, they are shop_id 600 and 601.
Can you guys advice me how to make the query? thanks for the help:D

Try this:
select shipping_id,
sum(case when status=1 then 1 else 0 end) as total_activate,
sum(case when status=0 then 1 else 0 end) as total_deactivate
from (select distinct shipping_id,
shop_id,
status
from test) a
group by shipping_id
order by shipping_id
See it here at fiddle: http://sqlfiddle.com/#!15/f15fd/4
I did not put the date on the query as it is not important for the result.

Yes thanks... I also figured it out already, you can do it this way too.... thx
SELECT
shipping_id,
COUNT(DISTINCT CASE WHEN status = 1 THEN shop_id END) AS total_activate,
COUNT(DISTINCT CASE WHEN status = 0 THEN shop_id END) AS total_deactivate
FROM
test
GROUP BY
shipping_id
ORDER BY
shipping_id

Related

Filter a sum of values until a certain threshold is reached

DbFiddle
Stuck. Need SO :)
Consider the following distribution of values.
ID CNT SEC SHOW(Bool)
1 10 1
2 1 1
3 25 1
4 1 1
5 2 1
6 10 1
7 50 2
8 90 2
My goal is to filter by sec and then
sort by cnt ascending,
sort by id ascending
and then flag/filter all rows as show - false where cnt is < 5 and until the sum of cnt of all hidden rows (show=false) is >= 5.
So the sum of all "hidden" rows may never be < 5.
Expected outcome for sec=1:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 2 | 1 | 1 | false |
| 4 | 1 | 2 | false |
| 5 | 2 | 4 | false |
| 1 | 10 | 14 | false | -- The sum of all hidden rows before this point is 4
| 6 | 10 | 24 | true | -- The total of all hidden rows is now >= 5.
| 3 | 25 | 49 | true |
Expected outcome for sec=2:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 7 | 50 | 50 | true |
| 8 | 90 | 140 | true |
I can already sort the values and create the sums etc. I have not figured out, how to determine how to set the cutoff point, when "hidding" is not necessary.
I am already doing this in "client code" and I want to migrate it to sql.
Here LAG() will help to achieve what you want. You can write your query like below:
with cte as (
SELECT
id, cnt, sec,
sum(cnt) over (partition by sec order by cnt,id) sum_
FROM
tbl )
select
id, cnt, sum_,
case
when sum_<5 or lag(sum_) over (partition by sec order by cnt,id) <5 then 'false'
else
'true'
end as "show"
from cte
DEMO

Cumulative sum of multiple window functions

I have a table with the structure:
id | date | player_id | score
--------------------------------------
1 | 2019-01-01 | 1 | 1
2 | 2019-01-02 | 1 | 1
3 | 2019-01-03 | 1 | 0
4 | 2019-01-04 | 1 | 0
5 | 2019-01-05 | 1 | 1
6 | 2019-01-06 | 1 | 1
7 | 2019-01-07 | 1 | 0
8 | 2019-01-08 | 1 | 1
9 | 2019-01-09 | 1 | 0
10 | 2019-01-10 | 1 | 0
11 | 2019-01-11 | 1 | 1
I want to create two more columns, 'total_score', 'last_seven_days'.
total_score is a rolling sum of the player_id score
last_seven_days is the score for the last seven days including to and prior to the date
I have written the following SQL query:
SELECT id,
date,
player_id,
score,
sum(score) OVER all_scores AS all_score,
sum(score) OVER last_seven AS last_seven_score
FROM scores
WINDOW all_scores AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING);
and get the following output:
id | date | player_id | score | all_score | last_seven_score
------------------------------------------------------------------
1 | 2019-01-01 | 1 | 1 | |
2 | 2019-01-02 | 1 | 1 | 1 | 1
3 | 2019-01-03 | 1 | 0 | 2 | 2
4 | 2019-01-04 | 1 | 0 | 2 | 2
5 | 2019-01-05 | 1 | 1 | 2 | 2
6 | 2019-01-06 | 1 | 1 | 3 | 3
7 | 2019-01-07 | 1 | 0 | 4 | 4
8 | 2019-01-08 | 1 | 1 | 4 | 4
9 | 2019-01-09 | 1 | 0 | 5 | 4
10 | 2019-01-10 | 1 | 0 | 5 | 3
11 | 2019-01-11 | 1 | 1 | 5 | 3
I have realised that I need to change this
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING)
to instead of being 7, to use some sort of date format because just having the number 7 will introduce errors.
i.e. it would be nice to be able to do date - 2days or date - 6days
I also would like to add columns such as 3 months, 6 months, 12 months later down the track and so need it to be able to be dynamic.
DEMO
demo:db<>fiddle
Solution for Postgres 11+:
Using RANGE interval as #LaurenzAlbe did
Solution for Postgres <11:
(just presenting the "days" part, the "all_scores" part is the same)
Joining the table against itself on the player_id and the relevant date range:
SELECT s1.*,
(SELECT SUM(s2.score)
FROM scores s2
WHERE s2.player_id = s1.player_id
AND s2."date" BETWEEN s1."date" - interval '7 days' AND s1."date" - interval '1 days')
FROM scores s1
You need to use a window by RANGE:
last_seven AS (PARTITION BY player_id
ORDER BY date
RANGE BETWEEN INTERVAL '7 days' PRECEDING
AND INTERVAL '1 day' PRECEDING)
This solution will work only from v11 on.

Calculate won, tie and lost games in postgresql

I have two tables "matches" and "opponents".
Matches
id | date
---+------------
1 | 2016-03-21 21:00:00
2 | 2016-03-22 09:00:00
...
Opponents
(score is null if not played)
id | match_id | team_id | score
---+----------+---------+------------
1 | 1 | 1 | 0
2 | 1 | 2 | 1
3 | 2 | 3 | 1
4 | 2 | 4 | 1
4 | 3 | 1 |
4 | 3 | 2 |
....
The goal is to create the following table
Team | won | tie | lost | total
-----+-----+-----+------+----------
2 | 1 | 0 | 0 | 1
3 | 0 | 1 | 0 | 1
4 | 0 | 1 | 0 | 1
1 | 0 | 0 | 1 | 1
Postgres v9.5
How do I do this? (Im open to maybe moving the "score" to somewhere else in my model if it makes sense.)
Divide et impera my son
with teams as (
select distinct team_id from opponents
),
teamgames as (
select t.team_id, o.match_id, o.score as team_score, oo.score as opponent_score
from teams t
join opponents o on t.team_id = o.team_id
join opponents oo on (oo.match_id = o.match_id and oo.id != o.id)
),
rankgames as (
select
team_id,
case
when team_score > opponent_score then 1
else 0
end as win,
case
when team_score = opponent_score then 1
else 0
end as tie,
case
when team_score < opponent_score then 1
else 0
end as loss
from teamgames
),
rank as (
select
team_id, sum(win) as win, sum(tie) as tie, sum(loss) as loss,
sum( win * 3 + tie * 1 ) as score
from rankgames
group by team_id
order by score desc
)
select * from rank;
Note1: You probably don't need the first "with" as you probably have already a table with one record per team
Note2: i think you can also achieve the same result with one single query, but in this way steps are clearer

Postgresql: Select sum with different conditions

I have two table table:
I. Table 1 like this:
------------------------------------------
codeid | pos | neg | category
-----------------------------------------
1 | 10 | 3 | begin2016
1 | 3 | 5 | justhere
3 | 7 | 7 | justthere
4 | 1 | 1 | else
4 | 12 | 0 | begin2015
4 | 5 | 12 | begin2013
1 | 2 | 50 | now
2 | 5 | 33 | now
5 | 33 | 0 | Begin2011
5 | 11 | 7 | begin2000
II. Table 2 like this:
------------------------------------------
codeid | codedesc | codegroupid
-----------------------------------------
1 | road runner | 1
2 | bike warrior | 2
3 | lazy driver | 4
4 | clever runner | 1
5 | worker | 3
6 | smarty | 1
7 | sweety | 3
8 | sweeper | 1
I want to have one result like this having two (or more) conditions:
sum pos and neg where codegroupid IN('1', '2', '3')
BUt do not sum pos and neg if category like 'begin%'
So the result will like this:
------------------------------------------
codeid | codedesc | sumpos | sumneg
-----------------------------------------
1 | roadrunner | 5 | 55 => (sumpos = 3+2, because 10 have category like 'begin%' so doesn't sum)
2 | bike warrior | 5 | 33
4 | clever runner | 1 | 1
5 | worker | 0 | 0 => (sumpos=sumneg=0) becase codeid 5 category ilike 'begin%'
Group by codeid, codedesc;
Sumpos is sum(pos) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all pos values become zero (0);
Sumpos is sum(neg) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all neg values become zero;
Any ideas how to do it?
Try:
SELECT
b.codeid,
b.codedesc,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.pos END) AS sumpos,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.neg END) AS sumneg
FROM
table1 AS a
JOIN
table2 AS b ON a.codeid = b.codeid
WHERE b.codegroupid IN (1, 2, 3)
GROUP BY
b.codeid,
b.codedesc;

PostgreSQL Join Two Tables by Nearest Date

I have a large single table of sent emails with dates and outcomes and I'd like to be able to match each row with the last time that email was sent and a specific outcome occurred (here that open=1). This needs to be done with PostgreSQL. For example:
Initial table:
id | sent_dt | bounced | open ` | clicked | unsubscribe
1 | 2015-01-01 | 1 | 0 | 0 | 0
1 | 2015-01-02 | 0 | 1 | 1 | 0
1 | 2015-01-03 | 0 | 1 | 1 | 0
2 | 2015-01-01 | 0 | 1 | 0 | 0
2 | 2015-01-02 | 1 | 0 | 0 | 0
2 | 2015-01-03 | 0 | 1 | 0 | 0
2 | 2015-01-04 | 0 | 1 | 0 | 1
Result table:
id | sent_dt | bounced| open | clicked | unsubscribe| previous_time
1 | 2015-01-01 | 1 | 0 | 0 | 0 | NULL
1 | 2015-01-02 | 0 | 1 | 1 | 0 | NULL
1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02
2 | 2015-01-01 | 0 | 1 | 0 | 0 | NULL
2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01
2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01
2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03
I have tried using Lag but I don't know how to go about that with the conditional that open needs to equal 1 while still returning all rows. I also tried doing a many to many Join on id then finding the minimum Datediff but that is going to essentially square the size of my table and takes entirely too long to compute (>7hrs). There are several answers which would work for SQL but none that I see work for PostgreSQL.
Thanks for any help guys!
You can use ROW_NUMBER() to achieve this desired result, connect each one to the one that occurred before if it has open = 1.
SELECT t.*,s.sent_dt
FROM
(SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY sent_dt DESC) rnk
FROM YourTable p) t
LEFT OUTER JOIN
(SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY sent_dt DESC) rnk
FROM YourTable p) s
ON(t.rnk = s.rnk-1 AND s.open = 1)
First I create a cte openFilter for the dates where the mail are open.
Then I join the table mail with those filter and get the dates previous to that email. Finally filter everyone execpt the latest open mail.
SQL Fiddle Demo
WITH openFilter as (
SELECT m."id", m."sent_dt"
FROM mail m
WHERE "open" = 1
)
SELECT m."id",
to_char(m."sent_dt", 'YYYY-MM-DD'),
"bounced", "open", "clicked", "unsubscribe",
to_char(o."sent_dt", 'YYYY-MM-DD') previous_time
FROM mail m
LEFT JOIN openFilter o
ON m."id" = o."id"
AND m."sent_dt" > o."sent_dt"
WHERE o."sent_dt" = (SELECT MAX(t."sent_dt")
FROM openFilter t
WHERE t."id" = m."id"
AND t."sent_dt" < m."sent_dt")
OR o."sent_dt" IS NULL
Output
| id | to_char | bounced | open | clicked | unsubscribe | previous_time |
|----|------------|---------|------|---------|-------------|---------------|
| 1 | 2015-01-01 | 1 | 0 | 0 | 0 | (null) |
| 1 | 2015-01-02 | 0 | 1 | 1 | 0 | (null) |
| 1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02 |
| 2 | 2015-01-01 | 0 | 1 | 0 | 0 | (null) |
| 2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03 |