I have a dataset with 3 columns:
Item_id
Sourced_from
Cost
1
Local
15
2
Local
10
3
Local
20
4
International
60
I am trying to write a query in PostgreSQL to fetch total of local and international items, customer can buy within the cash limit. For a cash limit 50, this is the output I am expecting:
Local
International
3
0
I have a pretty basic knowledge of PostgreSQL, and after googling it seems like this could be solved with recursive CTE, I am unable to figure out how should I select my source seed/anchor point in this scenario.
Any ideas, how should I approach this?
Not with a recursive CTE, but still works:
DDL/DML:
create table T
(
id integer primary key generated by default AS IDENTITY,
kind text not null,
cost integer not null
);
insert into T(kind, cost)
values ('local', 15),
('local', 10),
('local', 20),
('international', 60);
-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
(
-- 3. find the total cost of buying this row + all previous rows, grouped by its kind
select X.kind, sum(X.cost) as cost, X.rn
from (
with cte as (
-- 1. assign an increasing row number on each row from the table ordered by its cost
select *, row_number() over (order by T.cost asc, T.kind) as rn
from T
)
-- 2. self-join the CTE on each row with the same kind, but join it only with the rows that have a row number less than or equal to the current row number
select A.id, A.kind, A.cost, B.rn
from cte as A
join cte as B on A.kind = B.kind and A.rn <= B.rn
) as X
group by X.kind, X.rn
)
select M.kind, count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;
Output (db-fiddle):
+-------------+-----+
|kind |count|
+-------------+-----+
|local |3 |
|international|0 |
+-------------+-----+
I made a CTE example to solve the problem:
Recreated your case with
create table kp (item_id int, sourced_from varchar, cost int);
insert into kp values (1,'local',15);
insert into kp values (2,'local',10);
insert into kp values (3,'local',20);
insert into kp values (4,'international',60);
The following query does:
Selects from kp only items with cost less than 50
adds the item_id in the list_of_items
The recursive bit does:
joins with kp checking the source_from is the same and the kp.item_id is not already contained in the list_of_items (avoiding to put the same item multiple times)
computes the total cost (total_cost)
adds the new item item_id to the list_of_items
WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
SELECT
item_id,
item_id as next_item_id,
sourced_from,
cost as total_cost,
1 as nr_items,
ARRAY[item_id] list_of_items
from kp where cost < 50
UNION ALL
SELECT
kp.item_id,
items.item_id as next_item_id,
items.sourced_from,
items.total_cost + kp.cost total_cost,
items.nr_items + 1 as nr_items,
items.list_of_items || kp.item_id as list_of_items
FROM kp join items
on items.sourced_from=kp.sourced_from
and items.list_of_items::int[] #> ARRAY[kp.item_id] = false
WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;
If you run against the above dataset you'll end up with the detailed result
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
1 | 1 | local | 15 | 1 | {1}
2 | 2 | local | 10 | 1 | {2}
3 | 3 | local | 20 | 1 | {3}
1 | 2 | local | 25 | 2 | {2,1}
1 | 3 | local | 35 | 2 | {3,1}
2 | 1 | local | 25 | 2 | {1,2}
2 | 3 | local | 30 | 2 | {3,2}
3 | 1 | local | 35 | 2 | {1,3}
3 | 2 | local | 30 | 2 | {2,3}
1 | 2 | local | 45 | 3 | {3,2,1}
1 | 3 | local | 45 | 3 | {2,3,1}
2 | 1 | local | 45 | 3 | {3,1,2}
2 | 3 | local | 45 | 3 | {1,3,2}
3 | 1 | local | 45 | 3 | {2,1,3}
3 | 2 | local | 45 | 3 | {1,2,3}
(15 rows)
which shows all the permutations of the 3 local items.
Now if you substitute the last SELECT section with
SELECT * FROM items order by nr_items desc, total_cost desc, list_of_items asc limit 1;
You'll be able also to pick the combination having the max number of items, with the cost closest to the budget (I added also an ascending ordering based on list_of_items to receive always the same result in case of multiple combinations), which in the case above would result in
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
3 | 2 | local | 45 | 3 | {1,2,3}
(1 row)
If you are just interested in the maximum by sourced_from then the last SELECT becomes
select sourced_from, max(nr_items) nr_items from items group by sourced_from;
with the expected result being
sourced_from | nr_items
--------------+----------
local | 3
(1 row)
Edit: to speed up the query and avoiding having multiple permutations of the same objects (e.g. {1,2,3} and {1,2,3}) we can force the next item_id to be greater of the current one. Full query
WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
SELECT
item_id,
item_id as next_item_id,
sourced_from,
cost as total_cost,
1 as nr_items,
ARRAY[item_id] list_of_items
from kp where cost < 50
UNION ALL
SELECT
kp.item_id,
items.item_id as next_item_id,
items.sourced_from,
items.total_cost + kp.cost total_cost,
items.nr_items + 1 as nr_items,
items.list_of_items || kp.item_id as list_of_items
FROM kp join items
on items.sourced_from=kp.sourced_from
and items.list_of_items::int[] #> ARRAY[kp.item_id] = false
and items.item_id < kp.item_id
WHERE kp.cost + items.total_cost < 50
)
select * from items;
result
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
1 | 1 | local | 15 | 1 | {1}
2 | 2 | local | 10 | 1 | {2}
3 | 3 | local | 20 | 1 | {3}
2 | 1 | local | 25 | 2 | {1,2}
3 | 1 | local | 35 | 2 | {1,3}
3 | 2 | local | 30 | 2 | {2,3}
3 | 2 | local | 45 | 3 | {1,2,3}
(7 rows)
DbFiddle
Stuck. Need SO :)
Consider the following distribution of values.
ID CNT SEC SHOW(Bool)
1 10 1
2 1 1
3 25 1
4 1 1
5 2 1
6 10 1
7 50 2
8 90 2
My goal is to filter by sec and then
sort by cnt ascending,
sort by id ascending
and then flag/filter all rows as show - false where cnt is < 5 and until the sum of cnt of all hidden rows (show=false) is >= 5.
So the sum of all "hidden" rows may never be < 5.
Expected outcome for sec=1:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 2 | 1 | 1 | false |
| 4 | 1 | 2 | false |
| 5 | 2 | 4 | false |
| 1 | 10 | 14 | false | -- The sum of all hidden rows before this point is 4
| 6 | 10 | 24 | true | -- The total of all hidden rows is now >= 5.
| 3 | 25 | 49 | true |
Expected outcome for sec=2:
| id | cnt | cnt_sum | show |
|----|-----|---------|-------|
| 7 | 50 | 50 | true |
| 8 | 90 | 140 | true |
I can already sort the values and create the sums etc. I have not figured out, how to determine how to set the cutoff point, when "hidding" is not necessary.
I am already doing this in "client code" and I want to migrate it to sql.
Here LAG() will help to achieve what you want. You can write your query like below:
with cte as (
SELECT
id, cnt, sec,
sum(cnt) over (partition by sec order by cnt,id) sum_
FROM
tbl )
select
id, cnt, sum_,
case
when sum_<5 or lag(sum_) over (partition by sec order by cnt,id) <5 then 'false'
else
'true'
end as "show"
from cte
DEMO
TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.
After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |
I'm using postgres 9.5 and trying to calculate median and average price per unit with a GROUP BY id. Here is the query in DBFIDDLE
Here is the data
id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
Using percentile_cont this is my query:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
This query returns:
id| avg_price | median_price | avg_pp_unit | median_pp_unit
--+-----------+--------------+--------------+---------------
1 | 62 | 50 | 6 | 7
2 | 74 | 60 | 5 | 5
I'm pretty sure average calculation is correct. Is this the correct way to calculate median price per unit?
This post suggests this is correct (although performance is poor) but I'm curious if the division in the median calculation could skew the result.
Calculating median with PERCENTILE_CONT and grouping
The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value.
https://en.wikipedia.org/wiki/Median
So your median price is 55, and the median units is 9
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
I'm unsure what you intend for "median price per unit":
CREATE TABLE t(
id INTEGER NOT NULL
,price INTEGER NOT NULL
,units INTEGER NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
, percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
t;
| med_price | med_units
----|-----------|-----------
1 | 55 | 9
If column "price" represents a "unit price" then you don't need to divide 55 by 9, but if "price" is an "order total" then you would need to divide by units: 55/9 = 6.11
I have this table named Samples. The Date column values are just symbolic date values.
+----+------------+-------+------+
| Id | Product_Id | Price | Date |
+----+------------+-------+------+
| 1 | 1 | 100 | 1 |
| 2 | 2 | 100 | 2 |
| 3 | 3 | 100 | 3 |
| 4 | 1 | 100 | 4 |
| 5 | 2 | 100 | 5 |
| 6 | 3 | 100 | 6 |
...
+----+------------+-------+------+
I want to group by product_id such that I have the 1'th sample in descending date order and a new colomn added with the Price of the 7'th sample row in each product group. If the 7'th row does not exist, then the value should be null.
Example:
+----+------------+-------+------+----------+
| Id | Product_Id | Price | Date | 7thPrice |
+----+------------+-------+------+----------+
| 4 | 1 | 100 | 4 | 120 |
| 5 | 2 | 100 | 5 | 100 |
| 6 | 3 | 100 | 6 | NULL |
+----+------------+-------+------+----------+
I belive I can achieve the table without the '7thPrice' with the following
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Product_Id ORDER BY date DESC) r, * FROM Samples
) T WHERE T.r = 1
Any suggestions?
You can try something like this. I used your query to create a CTE. Then joined rank1 to rank7.
;with sampleCTE
as
(SELECT ROW_NUMBER() OVER (PARTITION BY Product_Id ORDER BY date DESC) r, * FROM Samples)
select *
from
(select * from samplecte where r = 1) a
left join
(select * from samplecte where r=7) b
on a.product_id = b.product_id