postgresql select max no if date is same - postgresql

i have a table like:
sr_no or_no date
1 1 2017-01-01
1 2 2017-02-02
1 3 2017-02-02
2 1 2017-01-02
2 2 2017-01-10
What i want is if date field for a sr_no is same then record with max or_no should be fetched. Output should be like:
sr_no or_no date
1 1 2017-01-01
1 3 2017-02-02
2 1 2017-01-02
2 2 2017-01-10

just distinct on to cut max(or_no) with order:
with a as (select distinct on (d) * from t order by d,or_no desc)
select * from a order by sr_no,or_no;
sr_no | or_no | d
-------+-------+------------
1 | 1 | 2017-01-01
1 | 3 | 2017-02-02
2 | 1 | 2017-01-02
2 | 2 | 2017-01-10
(4 rows)

Related

Cumulative sum of multiple window functions

I have a table with the structure:
id | date | player_id | score
--------------------------------------
1 | 2019-01-01 | 1 | 1
2 | 2019-01-02 | 1 | 1
3 | 2019-01-03 | 1 | 0
4 | 2019-01-04 | 1 | 0
5 | 2019-01-05 | 1 | 1
6 | 2019-01-06 | 1 | 1
7 | 2019-01-07 | 1 | 0
8 | 2019-01-08 | 1 | 1
9 | 2019-01-09 | 1 | 0
10 | 2019-01-10 | 1 | 0
11 | 2019-01-11 | 1 | 1
I want to create two more columns, 'total_score', 'last_seven_days'.
total_score is a rolling sum of the player_id score
last_seven_days is the score for the last seven days including to and prior to the date
I have written the following SQL query:
SELECT id,
date,
player_id,
score,
sum(score) OVER all_scores AS all_score,
sum(score) OVER last_seven AS last_seven_score
FROM scores
WINDOW all_scores AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING);
and get the following output:
id | date | player_id | score | all_score | last_seven_score
------------------------------------------------------------------
1 | 2019-01-01 | 1 | 1 | |
2 | 2019-01-02 | 1 | 1 | 1 | 1
3 | 2019-01-03 | 1 | 0 | 2 | 2
4 | 2019-01-04 | 1 | 0 | 2 | 2
5 | 2019-01-05 | 1 | 1 | 2 | 2
6 | 2019-01-06 | 1 | 1 | 3 | 3
7 | 2019-01-07 | 1 | 0 | 4 | 4
8 | 2019-01-08 | 1 | 1 | 4 | 4
9 | 2019-01-09 | 1 | 0 | 5 | 4
10 | 2019-01-10 | 1 | 0 | 5 | 3
11 | 2019-01-11 | 1 | 1 | 5 | 3
I have realised that I need to change this
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING)
to instead of being 7, to use some sort of date format because just having the number 7 will introduce errors.
i.e. it would be nice to be able to do date - 2days or date - 6days
I also would like to add columns such as 3 months, 6 months, 12 months later down the track and so need it to be able to be dynamic.
DEMO
demo:db<>fiddle
Solution for Postgres 11+:
Using RANGE interval as #LaurenzAlbe did
Solution for Postgres <11:
(just presenting the "days" part, the "all_scores" part is the same)
Joining the table against itself on the player_id and the relevant date range:
SELECT s1.*,
(SELECT SUM(s2.score)
FROM scores s2
WHERE s2.player_id = s1.player_id
AND s2."date" BETWEEN s1."date" - interval '7 days' AND s1."date" - interval '1 days')
FROM scores s1
You need to use a window by RANGE:
last_seven AS (PARTITION BY player_id
ORDER BY date
RANGE BETWEEN INTERVAL '7 days' PRECEDING
AND INTERVAL '1 day' PRECEDING)
This solution will work only from v11 on.

How to group by each date with certain condition in postgresql?

I have a table like this in postgresql. Each row shows a customer subscribed our products. For example, Customer 1 paid a 1 month subscription at 2019-07-03.
date product period subscriber_id units
2019-07-03 A 1Month 1 1
2019-07-02 A 1Year 2 1
2019-07-01 B 1Year 1 1
2019-06-30 B 1Month 3 1
2019-06-30 A 1Month 4 1
2019-06-03 B 1Month 4 1
2019-06-03 A 1Month 1 1
I want to calculate total valid different subscribers on each day, the result will look like
base_date product total_distinct_count
2019-07-03 A 3
2019-07-03 B 3
2019-07-02 A 3
2019-07-02 B 3
2019-07-01 A 2
2019-07-01 B 3
2019-06-30 A 2
2019-06-30 B 1
...
There are 3 different customers (1, 2, 4) who still subscribe product A at 2019-07-03 in first row.
I've tried to use groupby on each day and distinct count,
SELECT date, COUNT(DISTINCT(subscribers_id))
-- do some conditions
GROUP BY date, product
I don't know how to group by with this condition. If there is a better way to solve this problem. I will very appreciate !!!
This is pretty straightforward if you use date ranges.
CREATE TABLE SUBSCRIPTION (
date date,
product text,
period interval,
subscriber_id int,
units int
);
INSERT INTO SUBSCRIPTION VALUES
('2019-07-03', 'A' , '1 month', 1, 1),
('2019-07-02', 'A', '1 year', 2, 1),
('2019-07-01', 'B', '1 year', 1, 1),
('2019-06-30', 'B', '1 month', 3, 1),
('2019-06-30', 'A', '1 month', 4, 1),
('2019-06-03', 'B', '1 month', 4, 1),
('2019-06-03', 'A', '1 month', 1, 1);
-- First, get the list of dateranges, from 2019-06-03 to 2019-07-03 (or whatever you want)
WITH dates as (
SELECT daterange(t::date, (t + interval '1' day)::date, '[)')
FROM generate_series('2019-06-03'::timestamp without time zone,
'2019-07-03',
interval '1' day) as g(t)
)
SELECT lower(daterange)::date, count(distinct subscriber_id)
FROM dates
LEFT JOIN subscription ON daterange <#
daterange(subscription.date,
(subscription.date + period)::date)
GROUP BY daterange
;
lower | count
------------+-------
2019-06-03 | 2
2019-06-04 | 2
2019-06-05 | 2
2019-06-06 | 2
2019-06-07 | 2
2019-06-08 | 2
2019-06-09 | 2
2019-06-10 | 2
2019-06-11 | 2
2019-06-12 | 2
2019-06-13 | 2
2019-06-14 | 2
2019-06-15 | 2
2019-06-16 | 2
2019-06-17 | 2
2019-06-18 | 2
2019-06-19 | 2
2019-06-20 | 2
2019-06-21 | 2
2019-06-22 | 2
2019-06-23 | 2
2019-06-24 | 2
2019-06-25 | 2
2019-06-26 | 2
2019-06-27 | 2
2019-06-28 | 2
2019-06-29 | 2
2019-06-30 | 3
2019-07-01 | 3
2019-07-02 | 4
2019-07-03 | 4
(31 rows)
You could improve performance by storing (and indexing) the subscription valid time as a daterange instead of calculating it in the query.
EDIT: As Jay pointed out, I forgot to group by product:
WITH dates as (
SELECT daterange(t::date, (t + interval '1' day)::date, '[)')
FROM generate_series('2019-06-03'::timestamp without time zone,
'2019-07-03',
interval '1' day) as g(t)
)
SELECT lower(daterange)::date, product, count(distinct subscriber_id)
FROM dates
LEFT JOIN subscription ON daterange <#
daterange(subscription.date,
(subscription.date + period)::date)
GROUP BY daterange, product
;
lower | product | count
------------+---------+-------
2019-06-03 | A | 1
2019-06-03 | B | 1
2019-06-04 | A | 1
2019-06-04 | B | 1
2019-06-05 | A | 1
2019-06-05 | B | 1
2019-06-06 | A | 1
2019-06-06 | B | 1
2019-06-07 | A | 1
2019-06-07 | B | 1
2019-06-08 | A | 1
2019-06-08 | B | 1
2019-06-09 | A | 1
2019-06-09 | B | 1
2019-06-10 | A | 1
2019-06-10 | B | 1
2019-06-11 | A | 1
2019-06-11 | B | 1
2019-06-12 | A | 1
2019-06-12 | B | 1
2019-06-13 | A | 1
2019-06-13 | B | 1
2019-06-14 | A | 1
2019-06-14 | B | 1
2019-06-15 | A | 1
2019-06-15 | B | 1
2019-06-16 | A | 1
2019-06-16 | B | 1
2019-06-17 | A | 1
2019-06-17 | B | 1
2019-06-18 | A | 1
2019-06-18 | B | 1
2019-06-19 | A | 1
2019-06-19 | B | 1
2019-06-20 | A | 1
2019-06-20 | B | 1
2019-06-21 | A | 1
2019-06-21 | B | 1
2019-06-22 | A | 1
2019-06-22 | B | 1
2019-06-23 | A | 1
2019-06-23 | B | 1
2019-06-24 | A | 1
2019-06-24 | B | 1
2019-06-25 | A | 1
2019-06-25 | B | 1
2019-06-26 | A | 1
2019-06-26 | B | 1
2019-06-27 | A | 1
2019-06-27 | B | 1
2019-06-28 | A | 1
2019-06-28 | B | 1
2019-06-29 | A | 1
2019-06-29 | B | 1
2019-06-30 | A | 2
2019-06-30 | B | 2
2019-07-01 | A | 2
2019-07-01 | B | 3
2019-07-02 | A | 3
2019-07-02 | B | 3
2019-07-03 | A | 3
2019-07-03 | B | 2

postgres tablefunc, sales data grouped by product, with crosstab of months

TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.
After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |

Grouping by rolling date interval in Netezza

I have a table in Netezza that looks like this
Date Stock Return
2015-01-01 A xxx
2015-01-02 A xxx
2015-01-03 A 0
2015-01-04 A 0
2015-01-05 A xxx
2015-01-06 A xxx
2015-01-07 A xxx
2015-01-08 A xxx
2015-01-09 A xxx
2015-01-10 A 0
2015-01-11 A 0
2015-01-12 A xxx
2015-01-13 A xxx
2015-01-14 A xxx
2015-01-15 A xxx
2015-01-16 A xxx
2015-01-17 A 0
2015-01-18 A 0
2015-01-19 A xxx
2015-01-20 A xxx
The data represents stock returns for various stocks and dates. what I need to do is group the data by a given interval, and day of that interval. Another difficulty is that weekends the (0s) will have to be discounted (ignoring public holidays). And the start date of the first interval should be an arbitrary date.
For example my out put should look sth like this
Interval Q01 Q02 Q03 Q04 Q05
1 xxx xxx xxx xxx xxx
2 xxx xxx xxx xxx xxx
3 xxx xxx xxx xxx xxx
4 xxx xxx xxx xxx xxx
This output would represent an interval of the length 5 working days, with averaged returns as results, in terms of the raw data from above,
start date 1st Jan, 1st Interval includes 1/2/5/6/7 (3 and 4 are weekends and are ignored) Q01 would be the 1st, Q02 the 2nd, Q03 the 5th etc. The second interval goes from 8/9/12/13/14.
What I tried unsuccessfully is using
CEIL(CAST(EXTRACT(DOY FROM DATE) AS FLOAT) / CAST (10 AS FLOAT)) AS interval
EXTRACT(DAY FROM DATE) % 10 AS DAYinInterval
I also tried playing around with rolling counters and for variable starting dates setting my DOY to zero with s.th like this
CEIL(CAST(EXTRACT(DOY FROM DATE) - EXTRACT(DOY FROM 'start-date' AS FLOAT) / CAST (10 AS FLOAT)) AS Interval
The one thing that came closest to what I would expect is this
SUM(Number) OVER(PARTITION BY STOCK ORDER BY DATE ASC rows 10 preceding) AS Counter
Unfortunately it goes from 1 to 10 followed by 11s where it should start from 1 to 10 again.
I would love to see how this can get implemented in an elegant way. thanks
I'm not entirely sure I understand the question, but I think I might, so I'm going to take a swing at this with some windowed aggregates and subqueries.
Here's the sample data, plugging in some random non-zero data for weekdays.
DATE | STOCK | RETURN
------------+-------+--------
2015-01-01 | A | 16
2015-01-02 | A | 80
2015-01-03 | A | 0
2015-01-04 | A | 0
2015-01-05 | A | 60
2015-01-06 | A | 25
2015-01-07 | A | 12
2015-01-08 | A | 1
2015-01-09 | A | 81
2015-01-10 | A | 0
2015-01-11 | A | 0
2015-01-12 | A | 35
2015-01-13 | A | 20
2015-01-14 | A | 69
2015-01-15 | A | 72
2015-01-16 | A | 89
2015-01-17 | A | 0
2015-01-18 | A | 0
2015-01-19 | A | 100
2015-01-20 | A | 67
(20 rows)
Here's my swing at it, with embedded comments.
select avg(return),
date_period,
day_period
from (
-- use row_number to generate a sequential value for each DOW,
-- with a WHERE to filter out the weekends
select date,
stock,
return,
date_period ,
row_number() over (partition by date_period order by date asc) day_period
from (
-- bin out the entries by date_period using the first_value of the entire set as the starting point
-- modulo 7
select date,
stock,
return,
date + (first_value(date) over (order by date asc) - date) % 7 date_period
from stocks
where date >= '2015-01-01'
-- setting the starting period date here
)
foo
where extract (dow from date) not in (1,7)
)
foo
group by date_period, day_period
order by date_period asc;
The results:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
16.000000 | 2015-01-01 | 1
80.000000 | 2015-01-01 | 2
60.000000 | 2015-01-01 | 3
25.000000 | 2015-01-01 | 4
12.000000 | 2015-01-01 | 5
1.000000 | 2015-01-08 | 1
81.000000 | 2015-01-08 | 2
35.000000 | 2015-01-08 | 3
20.000000 | 2015-01-08 | 4
69.000000 | 2015-01-08 | 5
72.000000 | 2015-01-15 | 1
89.000000 | 2015-01-15 | 2
100.000000 | 2015-01-15 | 3
67.000000 | 2015-01-15 | 4
(14 rows)
Changing the starting date to '2015-01-03' to see if it adjusts properly:
...
from stocks
where date >= '2015-01-03'
...
And the results:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
60.000000 | 2015-01-03 | 1
25.000000 | 2015-01-03 | 2
12.000000 | 2015-01-03 | 3
1.000000 | 2015-01-03 | 4
81.000000 | 2015-01-03 | 5
35.000000 | 2015-01-10 | 1
20.000000 | 2015-01-10 | 2
69.000000 | 2015-01-10 | 3
72.000000 | 2015-01-10 | 4
89.000000 | 2015-01-10 | 5
100.000000 | 2015-01-17 | 1
67.000000 | 2015-01-17 | 2
(12 rows)

PostgreSQL Query to get pivot result

I have a log table look like this
rpt_id | shipping_id | shop_id | status | create_time
-------------------------------------------------------------
1 | 1 | 600 | 1 | 2013-12-01 01:06:50
2 | 1 | 600 | 0 | 2013-12-01 01:06:55
3 | 1 | 600 | 1 | 2013-12-02 10:00:30
4 | 2 | 600 | 1 | 2013-12-02 10:00:30
5 | 1 | 601 | 1 | 2013-12-02 11:20:10
6 | 2 | 601 | 1 | 2013-12-02 11:20:10
7 | 1 | 601 | 0 | 2013-12-03 09:10:10
8 | 3 | 602 | 1 | 2013-12-03 13:15:58
And I want to use single query to make it look like this
shipping_id | total_activate | total_deactivate
-----------------------------------------------
1 | 2 | 2
2 | 2 | 0
3 | 1 | 0
How should I query this?
Note:
Status = 1 = Activate
Status = 0 = Deactivate
Count total activate / deactivate rule: look at log table above. rpt_id 1 & 3, it has same shop_id, shipping_id and status. It should only count as one. See the result table. Shipping id 1 is only activated by 2 shops, they are shop_id 600 and 601.
Can you guys advice me how to make the query? thanks for the help:D
Try this:
select shipping_id,
sum(case when status=1 then 1 else 0 end) as total_activate,
sum(case when status=0 then 1 else 0 end) as total_deactivate
from (select distinct shipping_id,
shop_id,
status
from test) a
group by shipping_id
order by shipping_id
See it here at fiddle: http://sqlfiddle.com/#!15/f15fd/4
I did not put the date on the query as it is not important for the result.
Yes thanks... I also figured it out already, you can do it this way too.... thx
SELECT
shipping_id,
COUNT(DISTINCT CASE WHEN status = 1 THEN shop_id END) AS total_activate,
COUNT(DISTINCT CASE WHEN status = 0 THEN shop_id END) AS total_deactivate
FROM
test
GROUP BY
shipping_id
ORDER BY
shipping_id