postgres tablefunc, sales data grouped by product, with crosstab of months - postgresql

TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.

After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |

Related

How to sum for previous n number of days for a number of dates in MySQL

I have a list of dates each with a value in MYSQL.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like:
Any help would be appreciated.
Thanks
In MYSQL 8.0 you get to use analytic/windowed functions.
SELECT
*,
SUM(value) OVER (
ORDER BY date
ROWS BETWEEN 4 PRECEEDING
AND CURRENT ROW
) AS five_day_period,
SUM(value) OVER (
PARTITION BY DATE_FORMAT(date, '%Y-%m-01')
ORDER BY date
) AS month_to_date
FROM
your_table
In the first case, it's just saying sum up the value column, in date order, starting from 4 rows before the current row, and ending on the current row.
In the second case, there's no ROWS BETWEEN, and so it defaults to all the rows preceding the current row up to the current row. Instead, we add a PARTITION BY which says to treat all rows with the same calendar month separately from any rows on a different calendar month. This, all rows before the current one only looks back to the first row in the partition, which is the first row in the current month.
In MySQL 5.x there are no such functions. As such I would resort to correlated sub-queries.
SELECT
*,
(
SELECT SUM(value)
FROM your_table AS five_day_lookup
WHERE date >= DATE_SUB(your_table.date, INTERVAL 4 DAYS)
AND date <= your_table.date
)
AS five_day_period,
(
SELECT SUM(value)
FROM your_table AS monthly_lookup
WHERE date >= DATE(DATE_FORMAT(your_table.date, '%Y-%m-01'))
AND date <= your_table.date
)
AS month_to_date
FROM
your_table
Here is a other way to do that:
Select
t1.`mydate` AS 'Date'
, t1.`val` AS 'Value'
, SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
, SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
FROM tab t1
LEFT JOIN tab t2 ON t2.`mydate`
BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
t1.`mydate` - INTERVAL 4 DAY)
AND t1.`mydate`
GROUP BY t1.`mydate`
ORDER BY t1.`mydate` desc;
sample
MariaDB [bkvie]> SELECT * FROM tab;
+----+------------+------+
| id | mydate | val |
+----+------------+------+
| 1 | 2021-02-07 | 10 |
| 2 | 2021-02-06 | 30 |
| 3 | 2021-02-05 | 40 |
| 4 | 2021-02-04 | 50 |
| 5 | 2021-02-03 | 10 |
| 6 | 2021-02-02 | 20 |
| 7 | 2021-01-31 | 20 |
| 8 | 2021-01-30 | 10 |
| 9 | 2021-01-29 | 30 |
| 10 | 2021-01-28 | 40 |
| 11 | 2021-01-27 | 20 |
| 12 | 2021-01-26 | 30 |
| 13 | 2021-01-25 | 10 |
| 14 | 2021-01-24 | 40 |
| 15 | 2021-02-01 | 10 |
+----+------------+------+
15 rows in set (0.00 sec)
result
MariaDB [bkvie]> Select
-> t1.`mydate` AS 'Date'
-> , t1.`val` AS 'Value'
-> , SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
-> , SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
-> FROM tab t1
-> LEFT JOIN tab t2 ON t2.`mydate`
-> BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
-> t1.`mydate` - INTERVAL 4 DAY)
-> AND t1.`mydate`
-> GROUP BY t1.`mydate`
-> ORDER BY t1.`mydate` desc;
+------------+-------+--------------+---------------+
| Date | Value | 5 Day Period | Month of Date |
+------------+-------+--------------+---------------+
| 2021-02-07 | 10 | 140 | 170 |
| 2021-02-06 | 30 | 150 | 160 |
| 2021-02-05 | 40 | 130 | 130 |
| 2021-02-04 | 50 | 110 | 90 |
| 2021-02-03 | 10 | 70 | 40 |
| 2021-02-02 | 20 | 90 | 30 |
| 2021-02-01 | 10 | 110 | 10 |
| 2021-01-31 | 20 | 120 | 200 |
| 2021-01-30 | 10 | 130 | 180 |
| 2021-01-29 | 30 | 130 | 170 |
| 2021-01-28 | 40 | 140 | 140 |
| 2021-01-27 | 20 | 100 | 100 |
| 2021-01-26 | 30 | 80 | 80 |
| 2021-01-25 | 10 | 50 | 50 |
| 2021-01-24 | 40 | 40 | 40 |
+------------+-------+--------------+---------------+
15 rows in set (0.00 sec)
MariaDB [bkvie]>

Cumulative sum of multiple window functions

I have a table with the structure:
id | date | player_id | score
--------------------------------------
1 | 2019-01-01 | 1 | 1
2 | 2019-01-02 | 1 | 1
3 | 2019-01-03 | 1 | 0
4 | 2019-01-04 | 1 | 0
5 | 2019-01-05 | 1 | 1
6 | 2019-01-06 | 1 | 1
7 | 2019-01-07 | 1 | 0
8 | 2019-01-08 | 1 | 1
9 | 2019-01-09 | 1 | 0
10 | 2019-01-10 | 1 | 0
11 | 2019-01-11 | 1 | 1
I want to create two more columns, 'total_score', 'last_seven_days'.
total_score is a rolling sum of the player_id score
last_seven_days is the score for the last seven days including to and prior to the date
I have written the following SQL query:
SELECT id,
date,
player_id,
score,
sum(score) OVER all_scores AS all_score,
sum(score) OVER last_seven AS last_seven_score
FROM scores
WINDOW all_scores AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING);
and get the following output:
id | date | player_id | score | all_score | last_seven_score
------------------------------------------------------------------
1 | 2019-01-01 | 1 | 1 | |
2 | 2019-01-02 | 1 | 1 | 1 | 1
3 | 2019-01-03 | 1 | 0 | 2 | 2
4 | 2019-01-04 | 1 | 0 | 2 | 2
5 | 2019-01-05 | 1 | 1 | 2 | 2
6 | 2019-01-06 | 1 | 1 | 3 | 3
7 | 2019-01-07 | 1 | 0 | 4 | 4
8 | 2019-01-08 | 1 | 1 | 4 | 4
9 | 2019-01-09 | 1 | 0 | 5 | 4
10 | 2019-01-10 | 1 | 0 | 5 | 3
11 | 2019-01-11 | 1 | 1 | 5 | 3
I have realised that I need to change this
last_seven AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING)
to instead of being 7, to use some sort of date format because just having the number 7 will introduce errors.
i.e. it would be nice to be able to do date - 2days or date - 6days
I also would like to add columns such as 3 months, 6 months, 12 months later down the track and so need it to be able to be dynamic.
DEMO
demo:db<>fiddle
Solution for Postgres 11+:
Using RANGE interval as #LaurenzAlbe did
Solution for Postgres <11:
(just presenting the "days" part, the "all_scores" part is the same)
Joining the table against itself on the player_id and the relevant date range:
SELECT s1.*,
(SELECT SUM(s2.score)
FROM scores s2
WHERE s2.player_id = s1.player_id
AND s2."date" BETWEEN s1."date" - interval '7 days' AND s1."date" - interval '1 days')
FROM scores s1
You need to use a window by RANGE:
last_seven AS (PARTITION BY player_id
ORDER BY date
RANGE BETWEEN INTERVAL '7 days' PRECEDING
AND INTERVAL '1 day' PRECEDING)
This solution will work only from v11 on.

Month Not Printing When No Transaction In Particular Month

I had written a code for getting employee attrition details, Showing employee count of opening, join, left & closing, Month wise.
Issue here is that if there is no value in any above four column, system is not generating the month.
Please suggest the solution.
OUTPUT:
yyear | mmonth | charmonth | opening | incoming | relived | closing
-------+--------+-----------+---------+----------+---------+---------
2018 | 4 | Apr-18 | 14 | 2 | 0 | 16
2018 | 5 | May-18 | 16 | 1 | 0 | 17
2018 | 8 | Aug-18 | 17 | 3 | 0 | 20
2018 | 9 | Sep-18 | 20 | 1 | 0 | 21
2018 | 10 | Oct-18 | 21 | 23 | 4 | 40
2018 | 11 | Nov-18 | 40 | 5 | 1 | 44
2018 | 12 | Dec-18 | 44 | 2 | 0 | 46
2019 | 1 | Jan-19 | 46 | 1 | 0 | 47
2019 | 2 | Feb-19 | 47 | 1 | 0 | 48
2019 | 3 | Mar-19 | 48 | 6 | 1 | 53
2019 | 4 | Apr-19 | 53 | 1 | 0 | 54
2019 | 5 | May-19 | 54 | 3 | 1 | 56
2019 | 6 | Jun-19 | 56 | 2 | 0 | 58
(13 rows)
If you see the sequence of month, June-18, July-18 is missing.
Code:
WITH table_1 AS (
select
startdate as ddate,
enddate as lastday,
extract('month' from startdate) as mmonth,
extract('year' from startdate) as yyear,
to_char(to_timestamp(startdate),'Mon-YY') as months
from shr_period
where startdate >= DATE('2018-01-01')
and enddate <= DATE('2019-07-01')
and ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
)
SELECT
table_1.yyear,
table_1.mmonth,
table_1.months as charmonth,
(SELECT
COUNT(*)
FROM shr_emp_job OPENING
WHERE OPENING.dateofjoining < table_1.ddate
and OPENING.relieveddate is null
and OPENING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS OPENING,
count(*) as incoming,
(select count(*)
from shr_emp_job rel
where rel.relieveddate is not null
and rel.dateofjoining <= table_1.lastday
and rel.dateofjoining >= table_1.ddate
and rel.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) as relived,
(SELECT COUNT(*)
FROM shr_emp_job CLOSING
WHERE CLOSING.dateofjoining <= table_1.lastday
and relieveddate is null
and CLOSING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS CLOSING
FROM
shr_emp_job
JOIN table_1 ON table_1.mmonth = extract('month' from shr_emp_job.dateofjoining)
AND table_1.yyear = extract('year' from shr_emp_job.dateofjoining)
where shr_emp_job.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
GROUP BY table_1.mmonth, table_1.yyear, table_1.ddate, table_1.lastday, charmonth
ORDER BY table_1.yyear, table_1.mmonth;
As a quick look try changing your JOIN from an inner join to an outer join. So instead of
FROM
shr_emp_job
JOIN table_1 ON
do
FROM
shr_emp_job
RIGHT OUTER JOIN table_1 ON
This tells Postgres to keep the selected columns from the right mentioned table (table_1) even when there is no matching values in the left mentioned table (shr_emp_job). For those conditions NULL is supplied for the missing values.

How to Calculate Median Price Per Unit Using PERCENTILE_CONT and GROUP BY id

I'm using postgres 9.5 and trying to calculate median and average price per unit with a GROUP BY id. Here is the query in DBFIDDLE
Here is the data
id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
Using percentile_cont this is my query:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
This query returns:
id| avg_price | median_price | avg_pp_unit | median_pp_unit
--+-----------+--------------+--------------+---------------
1 | 62 | 50 | 6 | 7
2 | 74 | 60 | 5 | 5
I'm pretty sure average calculation is correct. Is this the correct way to calculate median price per unit?
This post suggests this is correct (although performance is poor) but I'm curious if the division in the median calculation could skew the result.
Calculating median with PERCENTILE_CONT and grouping
The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value.
https://en.wikipedia.org/wiki/Median
So your median price is 55, and the median units is 9
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
I'm unsure what you intend for "median price per unit":
CREATE TABLE t(
id INTEGER NOT NULL
,price INTEGER NOT NULL
,units INTEGER NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
, percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
t;
| med_price | med_units
----|-----------|-----------
1 | 55 | 9
If column "price" represents a "unit price" then you don't need to divide 55 by 9, but if "price" is an "order total" then you would need to divide by units: 55/9 = 6.11

Rank based on row number SQL Server 2008 R2

I want to group rank my table data by rowcount. First 12 rows that are ordered by date for each ProductID would get value = 1. Next 12 rows would get value = 2 assigned and so on.
How table structure looks:
For ProductID = 1267 are below associated dates:
02-01-2016
03-01-2016
.
. (skipping months..table has one date per month)
.
12-01-2016
02-01-2017
.
.
.
02-01-2018
Use row_number() over() with some arithmetic to calculate groups of 12 ordered by date (per productid). Change the sort to ASCendng or DESCendng to suit your need.
select *
, (11 + row_number() over(partition by productid order by somedate DESC)) / 12 as rnk
from mytable
GO
myTableID | productid | somedate | rnk
--------: | :------------- | :------------------ | :--
9 | 123456 | 2018-11-12 08:24:25 | 1
8 | 123456 | 2018-10-02 12:29:04 | 1
7 | 123456 | 2018-09-09 02:39:30 | 1
2 | 123456 | 2018-09-02 08:49:37 | 1
1 | 123456 | 2018-07-04 12:25:06 | 1
5 | 123456 | 2018-06-06 11:38:50 | 1
12 | 123456 | 2018-05-23 21:12:03 | 1
18 | 123456 | 2018-04-02 03:59:16 | 1
3 | 123456 | 2018-01-02 03:42:24 | 1
17 | 123456 | 2017-11-29 03:19:32 | 1
10 | 123456 | 2017-11-10 00:45:41 | 1
13 | 123456 | 2017-11-05 09:53:38 | 1
16 | 123456 | 2017-10-20 15:39:42 | 2
4 | 123456 | 2017-10-14 19:25:30 | 2
20 | 123456 | 2017-09-21 21:31:06 | 2
6 | 123456 | 2017-04-06 22:10:58 | 2
14 | 123456 | 2017-03-24 23:35:52 | 2
19 | 123456 | 2017-01-22 05:07:23 | 2
11 | 123456 | 2016-12-13 19:17:08 | 2
15 | 123456 | 2016-12-02 03:22:32 | 2
dbfiddle here