Group values by date in a generated timeseries - group-by

I have a table called 'daily_budgets':
+----+------------+----------+--------------+
| id | start_date | end_date | daily_budget |
+----+------------+----------+--------------+
| 1 | 25/04/18 | 29/04/18 | 500 |
+----+------------+----------+--------------+
| 2 | 26/04/18 | 27/04/18 | 1000 |
+----+------------+----------+--------------+
that shows the daily budget that an item has between a designated timeframe (start_date - end_date).
Then I have another one called 'year_2018' where I have generated a time_series with all the dates for 2018:
+----------+
| date |
+----------+
| 01/01/18 |
+----------+
| 02/01/18 |
+----------+
| 03/01/18 |
+----------+
| 04/01/18 |
+----------+
| 05/01/18 |
+----------+
etc.
Now I just want to join those two tables so that I get total daily_budget grouped by date. The first date in that resulting table should be the minimum start_date that there is in table 'daily_budgets'.
+----------+--------------+
| date | daily_budget |
+----------+--------------+
| 25/04/18 | 500 |
+----------+--------------+
| 26/04/18 | 1500 |
+----------+--------------+
| 27/04/18 | 1500 |
+----------+--------------+
| 28/04/18 | 500 |
+----------+--------------+
| 29/04/18 | 500 |
+----------+--------------+
Thanks very much for the help!
I am using: PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.2058

Join the two tables on the condition of the date from your calendar table being in between (inclusive) of the start and end budget dates. Then aggregate by date to generate the totals.
SELECT
c.date,
SUM(daily_budget) AS daily_budget
FROM year_2018 c
INNER JOIN daily_budgets db
ON c.date BETWEEN db.start_date AND db.end_date
GROUP BY
c.date;
Note that I use an inner join here, which will have the effect of filtering off any dates from the calendar table which do not appear in at least one record in the budget table. This should enforce that the earliest date you see reported is also the earliest date in the budget table.

Related

Create a pivot table for Month over Month variation

I have these records returned from a query
+---------+--------------+-----------+----------+
| Country | other fields | sales | date |
+---------+--------------+-----------+----------+
| US | 1 | $100.00 | 01/01/21 |
| CA | 1 | $100.00 | 01/01/21 |
| UK | 1 | $100.00 | 01/01/21 |
| FR | 1 | $100.00 | 01/01/21 |
| US | 1 | $200.00 | 01/02/21 |
| CA | 1 | $200.00 | 01/02/21 |
| UK | 1 | $200.00 | 01/02/21 |
| FR | 1 | $200.00 | 01/02/21 |
And I want to show the sales variation from one month to previous, like this:
| Country | 01/02/21 | 01/01/21 | Var% |
| US | $200.00 | $100.00 | 100% |
| CA | $200.00 | $100.00 | 100% |
| FR | $200.00 | $100.00 | 100% |
+---------+--------------+-----------+----------+
How could be done with a Postgres query?
if you always comparing two month only :
select country
, sum(sales) filter (where date ='01/01/21') month1
, sum(sales) filter (where date ='01/02/21') month2
, ((sum(sales) filter (where date ='01/02/21') /sum(sales) filter (where date ='01/01/21')) - 1) * 100 var
from tablename
where date in ('01/01/21' , '01/02/21')
group by country
you also can look at crosstab from tablefunc extension which basically does the same as above query.
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * ,("01/02/21" /"01/01/21") - 1) * 100 var
from(
select * from crosstab ('select Country,date , sales from tablename')
as ct(country varchar(2),"01/01/21" money , "01/02/21" money)
) t
for more info about crosstab , see tablefunc
but if you want to show date in rows instead of columns, you can easily generalize it for all the dates :
select *
, ((sales / LAG(sales,1,1) over (partition by country order by date)) -1)* 100 var
from
country

How to get non-aggregated measures?

I calculate my metrics with SQL and publish the resulting table to Tableau Server. Afterward, use this data source to create charts and dashboards.
For one analysis, I already calculated the measures per day with SQL. When I use the resulting table in Tableau, it aggregates these measures to SUM by default. However, I don't want to have SUM or AVG of the average or SUM of the Percentiles.
What I want is the result when I don't select date dimension and not GROUP BY date in SQL as attached below.
Here is the query:
SELECT
-- date,
COUNT(DISTINCT id) AS count_of_id,
AVG(timediff_in_sec) AS avg_timediff,
PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_50
FROM
(
--subquery
) AS t1
-- GROUP BY date
Here are the first 10 rows of the resulting table:
+------------+--------------+-------------+---------------+---------------+
| date | avg_timediff | count_of_id | percentile_25 | percentile_50 |
+------------+--------------+-------------+---------------+---------------+
| 10/06/2020 | 61,65186364 | 22 | 8,5765 | 13,3015 |
| 11/06/2020 | 127,2913333 | 3 | 15,6045 | 17,494 |
| 12/06/2020 | 306,0348214 | 28 | 12,2565 | 17,629 |
| 13/06/2020 | 13,2664 | 5 | 11,944 | 13,862 |
| 14/06/2020 | 16,728 | 7 | 14,021 | 17,187 |
| 15/06/2020 | 398,6424595 | 37 | 11,893 | 19,271 |
| 16/06/2020 | 293,6925152 | 33 | 12,527 | 17,134 |
| 17/06/2020 | 155,6554286 | 21 | 13,452 | 16,715 |
| 18/06/2020 | 383,8101429 | 7 | 266,048 | 493,722 |
+------------+--------------+-------------+---------------+---------------+
How can I achieve the desired output above?
Drag them all into the dimensions list, then they will be static dimensions. For your use you could also just drag the Date field to Rows. Aggregating 1 value, which you have for each date, returns the same value whatever the aggregation type.

Sumif in Postgresql between two tables

These are my two sample tables.
table "outage" (column formats are text, timestamp, timestamp)
+-------------------+----------------+----------------+
| outage_request_id | actual_start | actual_end |
+-------------------+----------------+----------------+
| 1-07697685 | 4/8/2015 4:48 | 4/8/2015 9:02 |
| 1-07223444 | 7/17/2015 4:24 | 8/01/2015 9:23 |
| 1-07223450 | 2/13/2015 4:24 | 4/29/2015 1:03 |
| 1-07223669 | 4/28/2017 9:20 | 4/30/2017 6:58 |
| 1-08985319 | 8/24/2015 3:18 | 8/24/2015 8:27 |
+-------------------+----------------+----------------+
and a second table "prices" (column format is numeric, timestamp)
+-------+---------------+
| price | stamp |
+-------+---------------+
| -2.31 | 2/1/2018 3:00 |
| -2.35 | 2/1/2018 4:00 |
| -1.77 | 2/1/2018 5:00 |
| -2.96 | 2/1/2018 6:00 |
| -5.14 | 2/1/2018 7:00 |
+-------+---------------+
My Goal: To sum the prices in between the start and stop times of each outage_request_id.
I have no idea how to go about properly joining the tables and getting a sum of prices in those outage timestamp ranges.
I can't promise this is efficient (in fact for very large tables I feel pretty confident it's not), but this should notionally get you what you want:
select
o.outage_request_id, o.actual_start, o.actual_end,
sum (p.price) as total_price
from
outage o
left join prices p on
p.stamp between o.actual_start and o.actual_end
group by
o.outage_request_id, o.actual_start, o.actual_end

postgres lag when data is missing

I have data on baseball players annual salaries, with some years missing. What I would like to do is calculate the min, max, average change in salary from the prior year for all players in a year.
For example data looks like below from the table 'salaries':
| playerid | yearid | salary |
| a | 2016 | 10000 |
| b | 2016 | 5000 |
| a | 2015 | 9000 |
| b | 2015 | 3000 |
| a | 2014 | 3000 |
| b | 2014 | 15000 |
| a | 2010 | 1000 |
As you can see, player A has a yearly change of 1k and 6k. player B has a yearly change of 2k and -12k. So I would like a select statement that brings out:
| yearid | min change | max change | avg change |
| 2016 | 1k | 2k | 1.5k |
| 2015 | -12k | 6k | -9k |
Is there a way to do this?
My lag function has unfortunately captured the difference between 2014 and 2010 for playerid a and that is obviously wrong. I couldn't figure out how to use the lag function only if the previous row's yearid was 1 less than the current rows yearid.
Any suggestions would be greatly appreciated.
Just use the previous year for the filtering:
select year, min(salary - prev_salary), max(salary - prev_salary),
avg(salary - prev_salary)
from (select s.*,
lag(s.salary) over (partition by s.playerid order by yearid) as prev_salary,
lag(s.yearid) over (partition by s.playerid order by yearid) as prev_yearid
from salaries s
) s
where prev_yearid = yearid - 1;
Or, you can just use a join:
select s.yearid, . . .
from salaries s join
salaries sp
on sp.playerid = s.playerid and sp.yearid = s.yearid - 1
group by s.yearid;

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html