PostgreSQL get aggregated results based on two tables - postgresql

I have two tables,
1. items
2. festival_rents
Sample records:
items
id | name | rent
------------------------
1 | Car | 100
2 | Truck | 150
3 | Van | 200
Sample records:
festival_rents
id | items_id | start_date | end_date | rent
------------------------------------------------
1 | 1 | 2018-07-01 | 2018-07-02 | 200
2 | 1 | 2018-07-04 | 2018-07-06 | 300
3 | 3 | 2018-07-06 | 2018-07-07 | 400
The table items contains list of items with name and rent. Each item in the items table may or may not have festival_rents. The table festival_rents has higher rents for each item for a date range with start_date and end_date. It is possible for a item to have multiple festival_rents with different date ranges. But it's for sure that date ranges for multiple festival_rents belonging to a same item won't collide and all date ranges are isolated.
The query that I'm looking for is, for a given start_date and end_date range, for each item in the items table, calculate the total rent and display each item with it's calculated total rent. The rent calculation for each item should include the festival_rents also, if any of the items has festival_rents falling within the given start_date and end_date.
Expected result:
Input: start_date=2018-07-01 and end_date=2018-07-06
Output:
id | name | total_price
------------------------
1 | Car | 1100 // 1st 2 days festival rent + 1 day normal rent + last 3 days festival rent (2 * 200) + (1 * 100) + (3 * 200)
2 | Truck | 900 // 6 days normal rent (6 * 150)
3 | Van | 1400 // 5 days normal rent + 1 day festival rent (200 * 5) + (400 * 1)

You need a list of days either on a table or create on the fly:
How to get list of dates between two dates in mysql select query
Generating time series between two dates in PostgreSQL
SELECT i.name, SUM (f.rent)
FROM allDays a
JOIN festival_rents f
ON a.day >= f.start_date
AND a.day < f.end_date
JOIN items i
ON f.item_id = i.item_id
WHERE a.day BETWEEN #start_date
AND #end_date
GROUP BY i.name
I assume the end_date is open range. So if you have ranges [A,B) and [B,C) Date B will have the rent from [B,C)

Related

How to group by counted rows in Postgres?

If I have a table:
id | status
----+--------
2 | 200
1 | 0
4 | 100
3 | 200
5 | 200
I want to count the number of occurrences of each status. I have tried to use the COUNT/OVER function
SELECT status, COUNT(*) OVER () AS all, COUNT(*) OVER (PARTITION by status) as count FROM my_table;
The results are what is expected per the postgres docs on windows "However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities"
status | all | count
--------+-------+-------
0 | 5 | 1
100 | 5 | 1
200 | 5 | 3
200 | 5 | 3
200 | 5 | 3
How instead can get an output that combines the rows, so that I only get 1 row per unique status if the partition is required?
status | all | count
--------+-------+-------
0 | 5 | 1
100 | 5 | 1
200 | 5 | 3
No window function necessary in the first stage of the query, i.e. getting the counts per status. Window functions work on the result of the non-windowing part of the query, thus you can have a window function referring the aggregate & non-aggregate columns in a query. To get all_counts, it is sufficient to SUM the status_count over all the rows.
SELECT
status
, COUNT(*) status_count
, SUM(COUNT(*)) OVER () all_count
FROM my_table
GROUP BY status

Count records between rolling date range in Tableau

I have a file with a [start] and [end] date in Tableau and would like to create a calculated field that counts number of rows on a rolling basis that occur between [start] and [end] for each [person]. This data is like so:
| Start | End | Person
|1/1/2019 |1/7/2019 | A
|1/3/2019 |1/9/2019 | A
|1/8/2019 |1/15/2019| A
|1/1/2019 |1/7/2019 | B
I'd like to create a calculated field [count] with results like so:
| Start | End | Person | Count
|1/1/2019 |1/7/2019 | A | 1
|1/3/2019 |1/9/2019 | A | 2
|1/8/2019 |1/15/2019| A | 2
|1/1/2019 |1/7/2019 | B | 1
EDITED: A good analogy for what [count] represents is: "how many videos does each person rented at the same time as of that moment?" With the 1st row for person A, count is 1, with 1 item rented. As of row 2, person A has 2 items rented. But for the 3rd row [count]= 2 since the video rented in the first row is no longer rented.

How to count rows using a variable date range provided by a table in PostgreSQL

I suspect I require some sort of windowing function to do this. I have the following item data as an example:
count | date
------+-----------
3 | 2017-09-15
9 | 2017-09-18
2 | 2017-09-19
6 | 2017-09-20
3 | 2017-09-21
So there are gaps in my data first off, and I have another query here:
select until_date, until_date - (lag(until_date) over ()) as delta_days from ranges
Which I have generated the following data:
until_date | delta_days
-----------+-----------
2017-09-08 |
2017-09-11 | 3
2017-09-13 | 2
2017-09-18 | 5
2017-09-21 | 3
2017-09-22 | 1
So I'd like my final query to produce this result:
start_date | ending_date | total_items
-----------+-------------+--------------
2017-09-08 | 2017-09-10 | 0
2017-09-11 | 2017-09-12 | 0
2017-09-13 | 2017-09-17 | 3
2017-09-18 | 2017-09-20 | 15
2017-09-21 | 2017-09-22 | 3
Which tells me the total count of items from the first table, per day, based on the custom ranges from the second table.
In this particular example, I would be summing up total_items BETWEEN start AND end (since there would be overlap on the dates, I'd subtract 1 from the end date to not count duplicates)
Anyone know how to do this?
Thanks!
Use the daterange type. Note that you do not have to calculate delta_days, just convert ranges to dataranges and use the operator <# - element is contained by.
with counts(count, date) as (
values
(3, '2017-09-15'::date),
(9, '2017-09-18'),
(2, '2017-09-19'),
(6, '2017-09-20'),
(3, '2017-09-21')
),
ranges (until_date) as (
values
('2017-09-08'::date),
('2017-09-11'),
('2017-09-13'),
('2017-09-18'),
('2017-09-21'),
('2017-09-22')
)
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
order by 1;
daterange | total_items
-------------------------+-------------
[2017-09-08,2017-09-11) | 0
[2017-09-11,2017-09-13) | 0
[2017-09-13,2017-09-18) | 3
[2017-09-18,2017-09-21) | 17
[2017-09-21,2017-09-22) | 3
(5 rows)
Note, that in the dateranges above lower bounds are inclusive while upper bound are exclusive.
If you want to calculate items per day in the dateranges:
select
daterange, total_items,
round(total_items::dec/(upper(daterange)- lower(daterange)), 2) as items_per_day
from (
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
) s
order by 1
daterange | total_items | items_per_day
-------------------------+-------------+---------------
[2017-09-08,2017-09-11) | 0 | 0.00
[2017-09-11,2017-09-13) | 0 | 0.00
[2017-09-13,2017-09-18) | 3 | 0.60
[2017-09-18,2017-09-21) | 17 | 5.67
[2017-09-21,2017-09-22) | 3 | 3.00
(5 rows)

Postgresql Time Series for each Record

I'm having issues trying to wrap my head around how to extract some time series stats from my Postgres DB.
For example, I have several stores. I record how many sales each store made each day in a table that looks like:
+------------+----------+-------+
| Date | Store ID | Count |
+------------+----------+-------+
| 2017-02-01 | 1 | 10 |
| 2017-02-01 | 2 | 20 |
| 2017-02-03 | 1 | 11 |
| 2017-02-03 | 2 | 21 |
| 2017-02-04 | 3 | 30 |
+------------+----------+-------+
I'm trying to display this data on a bar/line graph with different lines per Store and the blank dates filled in with 0.
I have been successful getting it to show the sum per day (combining all the stores into one sum) using generate_series, but I can't figure out how to separate it out so each store has a value for each day... the result being something like:
["Store ID 1", 10, 0, 11, 0]
["Store ID 2", 20, 0, 21, 0]
["Store ID 3", 0, 0, 0, 30]
It is necessary to build a cross join dates X stores:
select store_id, array_agg(total order by date) as total
from (
select store_id, date, coalesce(sum(total), 0) as total
from
t
right join (
generate_series(
(select min(date) from t),
(select max(date) from t),
'1 day'
) gs (date)
cross join
(select distinct store_id from t) s
) using (date, store_id)
group by 1,2
) s
group by 1
order by 1
;
store_id | total
----------+-------------
1 | {10,0,11,0}
2 | {20,0,21,0}
3 | {0,0,0,30}
Sample data:
create table t (date date, store_id int, total int);
insert into t (date, store_id, total) values
('2017-02-01',1,10),
('2017-02-01',2,20),
('2017-02-03',1,11),
('2017-02-03',2,21),
('2017-02-04',3,30);

Max consecutive years for each customer in Tableau

I am trying to find for each customer the Max consecutive years he buys something. I tried to create a calculated field but to no avail.
I created two calculated fields
Consecutive: if max([Count])>0 then previous_value(0)+1+index()-index() else 0 end
max: window_max([Consecutive])
My data looks something like:
Year | Customer | Count
1996 | a | 2
1996 | b | 1
1997 | a | 1
1997 | b | 2
1998 | b | 1
So the result would be
a:2
b:3
Use nested table calcs.
The first calc, call it running_good_years, is a running count of consecutive years with sales.
If count(Sales) = 0 then 0 else previous_value(0) + 1 end
The second just returns the max
Window_max(running_good_years)
With table calcs, defining the partitioning and addressing is critical. Partition by Customer, Address by year