PostgreSQL - aggregate series interval 2 years - postgresql

I have some
id_merchant | data | sell
11 | 2009-07-20 | 1100.00
22 | 2009-07-27 | 1100.00
11 | 2005-07-27 | 620.00
31 | 2009-08-07 | 2403.20
33 | 2009-08-12 | 4822.00
52 | 2009-08-14 | 4066.00
52 | 2009-08-15 | 295.00
82 | 2009-08-15 | 0.00
23 | 2011-06-11 | 340.00
23 | 2012-03-22 | 1000.00
23 | 2012-04-08 | 1000.00
23 | 2012-07-13 | 36.00
23 | 2013-07-17 | 2480.00
23 | 2014-04-09 | 1000.00
23 | 2014-06-10 | 1500.00
23 | 2014-07-20 | 700.50
I want to create table as select with interval 2 years. First date for merchant is min(date). So i generate series (min(date)::date,current(date)::date,'2 years')
I want to get to table like that:
id_merchant | data | sum(sell)
23 | 2011-06-11 | 12382.71
23 | 2013-06-11 | 12382.71
23 | 2015-06-11 | 12382.71
But there is some mistake in my query because sum(sell) is the same for all series and the sum is wrong. Event if i sum sale ther is about 6000 not 12382.71.
My query:
select m.id_gos_pla,
generate_series(m.min::date,dath()::date,'2 years')::date,
sum(rch.suma)
from rch, minmax m
where rch.id_gos_pla=m.id_gos_pla
group by m.id_gos_pla,m.min,m.max
order by 1,2;
Pls for help.

I would do it this way:
select
periods.id_merchant,
periods.date as period_start,
(periods.date + interval '2' year - interval '1' day)::date as period_end,
coalesce(sum(merchants.amount), 0) as sum
from
(
select
id_merchant,
generate_series(min(date), max(date), '2 year'::interval)::date as date
from merchants
group by id_merchant
) periods
left join merchants on
periods.id_merchant = merchants.id_merchant and
merchants.date >= periods.date and
merchants.date < periods.date + interval '2' year
group by periods.id_merchant, periods.date
order by periods.id_merchant, periods.date
We use sub-query to generate date periods for each id_merchant according to the first date for this merchant and required interval. Then join it with merchants table on date within period condition and group by merchant_id and period (periods.date is the starting period date which is enough). And finally we take everything we need: starting date, ending date, merchant and sum.

Related

How to sum for previous n number of days for a number of dates in MySQL

I have a list of dates each with a value in MYSQL.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like:
Any help would be appreciated.
Thanks
In MYSQL 8.0 you get to use analytic/windowed functions.
SELECT
*,
SUM(value) OVER (
ORDER BY date
ROWS BETWEEN 4 PRECEEDING
AND CURRENT ROW
) AS five_day_period,
SUM(value) OVER (
PARTITION BY DATE_FORMAT(date, '%Y-%m-01')
ORDER BY date
) AS month_to_date
FROM
your_table
In the first case, it's just saying sum up the value column, in date order, starting from 4 rows before the current row, and ending on the current row.
In the second case, there's no ROWS BETWEEN, and so it defaults to all the rows preceding the current row up to the current row. Instead, we add a PARTITION BY which says to treat all rows with the same calendar month separately from any rows on a different calendar month. This, all rows before the current one only looks back to the first row in the partition, which is the first row in the current month.
In MySQL 5.x there are no such functions. As such I would resort to correlated sub-queries.
SELECT
*,
(
SELECT SUM(value)
FROM your_table AS five_day_lookup
WHERE date >= DATE_SUB(your_table.date, INTERVAL 4 DAYS)
AND date <= your_table.date
)
AS five_day_period,
(
SELECT SUM(value)
FROM your_table AS monthly_lookup
WHERE date >= DATE(DATE_FORMAT(your_table.date, '%Y-%m-01'))
AND date <= your_table.date
)
AS month_to_date
FROM
your_table
Here is a other way to do that:
Select
t1.`mydate` AS 'Date'
, t1.`val` AS 'Value'
, SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
, SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
FROM tab t1
LEFT JOIN tab t2 ON t2.`mydate`
BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
t1.`mydate` - INTERVAL 4 DAY)
AND t1.`mydate`
GROUP BY t1.`mydate`
ORDER BY t1.`mydate` desc;
sample
MariaDB [bkvie]> SELECT * FROM tab;
+----+------------+------+
| id | mydate | val |
+----+------------+------+
| 1 | 2021-02-07 | 10 |
| 2 | 2021-02-06 | 30 |
| 3 | 2021-02-05 | 40 |
| 4 | 2021-02-04 | 50 |
| 5 | 2021-02-03 | 10 |
| 6 | 2021-02-02 | 20 |
| 7 | 2021-01-31 | 20 |
| 8 | 2021-01-30 | 10 |
| 9 | 2021-01-29 | 30 |
| 10 | 2021-01-28 | 40 |
| 11 | 2021-01-27 | 20 |
| 12 | 2021-01-26 | 30 |
| 13 | 2021-01-25 | 10 |
| 14 | 2021-01-24 | 40 |
| 15 | 2021-02-01 | 10 |
+----+------------+------+
15 rows in set (0.00 sec)
result
MariaDB [bkvie]> Select
-> t1.`mydate` AS 'Date'
-> , t1.`val` AS 'Value'
-> , SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
-> , SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
-> FROM tab t1
-> LEFT JOIN tab t2 ON t2.`mydate`
-> BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
-> t1.`mydate` - INTERVAL 4 DAY)
-> AND t1.`mydate`
-> GROUP BY t1.`mydate`
-> ORDER BY t1.`mydate` desc;
+------------+-------+--------------+---------------+
| Date | Value | 5 Day Period | Month of Date |
+------------+-------+--------------+---------------+
| 2021-02-07 | 10 | 140 | 170 |
| 2021-02-06 | 30 | 150 | 160 |
| 2021-02-05 | 40 | 130 | 130 |
| 2021-02-04 | 50 | 110 | 90 |
| 2021-02-03 | 10 | 70 | 40 |
| 2021-02-02 | 20 | 90 | 30 |
| 2021-02-01 | 10 | 110 | 10 |
| 2021-01-31 | 20 | 120 | 200 |
| 2021-01-30 | 10 | 130 | 180 |
| 2021-01-29 | 30 | 130 | 170 |
| 2021-01-28 | 40 | 140 | 140 |
| 2021-01-27 | 20 | 100 | 100 |
| 2021-01-26 | 30 | 80 | 80 |
| 2021-01-25 | 10 | 50 | 50 |
| 2021-01-24 | 40 | 40 | 40 |
+------------+-------+--------------+---------------+
15 rows in set (0.00 sec)
MariaDB [bkvie]>

Postgres max value per hour with time it occurred

Given a Postgres table with columns highwater_datetime::timestamp and highwater::integer, I am trying to construct a select statement for a given highwater_datetime range, that generates rows with a column for the max highwater for each hour (first occurrence when dups) and another column showing the highwater_datetime when it occurred (truncated to the minute and order by highwater_datetime asc). e.g.
| highwater_datetime | max_highwater |
+--------------------+---------------+
| 2021-01-27 20:05 | 8 |
| 2021-01-27 21:00 | 7 |
| 2021-01-27 22:00 | 7 |
| 2021-01-27 23:00 | 7 |
| 2021-01-28 00:00 | 7 |
| 2021-01-28 01:32 | 7 |
| 2021-01-28 02:00 | 7 |
| 2021-01-28 03:00 | 7 |
| 2021-01-28 04:22 | 9 |
DISTINCT ON should do the trick:
SELECT DISTINCT ON (date_trunc('hour', highwater_datetime))
highwater_datetime,
highwater
FROM mytable
ORDER BY date_trunc('hour', highwater_datetime),
highwater DESC,
highwater_datetime;
DISTINCT ON will output the first row for each entry with the same hour according to the ORDER BY clause.

Query to verify number 24 hours before the calling

I have a table [POSTGRE] like this...
- Agent who answered the call,
- call number
- datetime end of call.
Agent | Caller | DateTime
------------------------------------
101 | 555-1234 | 12-01-16 00:00
101 | 555-1234 | 12-01-16 01:00
102 | 555-1234 | 12-01-16 02:00
102 | 555-1234 | 13-01-16 06:00
1º - I need to verify each number (each line).. and look 24 hours before call, to check if is Recalling, eg.: day 13th was not recalling because the last call from this number was not in 24 hours.
2º - Need to get the agent from recalling.
need to display like this...
Agent | Caller | DateTime | Recalling | Recalling-From
----------------------------------------------------------------------
101 | 555-1234 | 12-01-16 00:00 | NO |
101 | 555-1234 | 12-01-16 01:00 | YES | 101
102 | 555-1234 | 12-01-16 02:00 | YES | 101
102 | 555-1234 | 13-01-16 06:00 | NO |
my query is ...
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
calleridnum As numero,
MAX(datahora_inicio) AS data_ini
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
)
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
(SELECT count(calleridnum) FROM callcenter.chamada_fila_in f INNER JOIN maiordata md ON f.calleridnum = md.numero WHERE calleridnum = cfin.calleridnum AND datahora_inicio BETWEEN md.data_ini - interval '24 hours' AND md.data_ini ) AS QtdRechamada,
calleridnum As numero
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
ORDER BY
dia,numero DESC
need a better function or method..
this query is very heavy in my database, need To be optimized.

Calculating the forecasts by month for the last 3 months in postgres

I have a table called forecasts where we store the forecasts for all the products for the next 6 months. For example when we are in November we create the forecast for December, January, February, March, April and May. The forecasts table looks something like the one below
+----------------+---------------+--------------+----------+
| product_number | forecasted_on | forecast_for | quantity |
+----------------+---------------+--------------+----------+
| Prod 1 | 2016-11-01 | 2016-12-01 | 100 |
| Prod 1 | 2016-11-01 | 2017-01-01 | 200 |
| Prod 1 | 2016-11-01 | 2017-02-01 | 300 |
| Prod 1 | 2016-11-01 | 2017-03-01 | 400 |
| Prod 1 | 2016-11-01 | 2017-04-01 | 500 |
| Prod 1 | 2016-11-01 | 2017-05-01 | 600 |
+----------------+---------------+--------------+----------+
Where the table contains a list of product numbers and the date on which the forecast was created i.e. forecasted_on and a month for which the forecast was created for along with the forecasted quantity.
Each month data gets added for the next 6 months. So when the forecasted_on is 1-December-2016 forecasts will be created for January till June.
I am trying to create a report that shows how the total forecasts have varied for the last 3 months. Something like this
+------------+----------------+---------------+----------------+
| | 0 months prior | 1 month prior | 2 months prior |
+------------+----------------+---------------+----------------+
| 2016-12-01 | 200 | 150 | 250 |
| 2017-01-01 | 300 | 250 | 150 |
| 2017-02-01 | 100 | 150 | 100 |
+------------+----------------+---------------+----------------+
Currently I am using a lot of repetitive code in rails to generate this table. I wanted to see if there was an easier way to do it directly using a SQL query.
Any help would be greatly appreciated.
Use PIVOT query:
select forecast_for,
sum( case when forecasted_on + interval '1' month = forecast_for
then quantity end ) q_0,
sum( case when forecasted_on + interval '2' month = forecast_for
then quantity end ) q_1,
sum( case when forecasted_on + interval '3' month = forecast_for
then quantity end ) q_2,
sum( case when forecasted_on + interval '4' month = forecast_for
then quantity end ) q_3,
sum( case when forecasted_on + interval '5' month = forecast_for
then quantity end ) q_4,
sum( case when forecasted_on + interval '6' month = forecast_for
then quantity end ) q_5
from Table1
group by forecast_for
order by 1
;
Demo: http://sqlfiddle.com/#!15/30e5e/1
| forecast_for | q_0 | q_1 | q_2 | q_3 | q_4 | q_5 |
|----------------------------|--------|--------|--------|--------|--------|--------|
| December, 01 2016 00:00:00 | 100 | (null) | (null) | (null) | (null) | (null) |
| January, 01 2017 00:00:00 | (null) | 200 | (null) | (null) | (null) | (null) |
| February, 01 2017 00:00:00 | (null) | (null) | 300 | (null) | (null) | (null) |
| March, 01 2017 00:00:00 | (null) | (null) | (null) | 400 | (null) | (null) |
| April, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | 500 | (null) |
| May, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | (null) | 600 |
Assuming that (product_number, forcast_on, forcasted_for) is unique (so no aggregation is required), then this should do the job:
WITH forecast_dates AS (
SELECT DISTINCT product_number, forcast_for
FROM forecasts
)
SELECT
fd.forcast_for AS "forecast for",
m1.quantity AS "one month prior",
m2.quantity AS "two months prior",
m3.quantity AS "three months prior"
FROM forecast_dates fd
LEFT JOIN forecasts m1 ON fd.forcast_for = m1.forcast_for AND fd.forcast_for = m1.forcasted_on + INTERVAL '1 month'
LEFT JOIN forecasts m2 ON fd.forcast_for = m2.forcast_for AND fd.forcast_for = m2.forcasted_on + INTERVAL '2 month'
LEFT JOIN forecasts m3 ON fd.forcast_for = m3.forcast_for AND fd.forcast_for = m3.forcasted_on + INTERVAL '3 month'
WHERE fd.product_number = 'Prod 1'
ORDER BY fd.forcast_for;

Linear regression with postgres

I use Postgres and i have a large number of rows with values and date per station.
(Dates can be separated by several days.)
id | value | idstation | udate
--------+-------+-----------+-----
1 | 5 | 12 | 1984-02-11 00:00:00
2 | 7 | 12 | 1984-02-17 00:00:00
3 | 8 | 12 | 1984-02-21 00:00:00
4 | 9 | 12 | 1984-02-23 00:00:00
5 | 4 | 12 | 1984-02-24 00:00:00
6 | 8 | 12 | 1984-02-28 00:00:00
7 | 9 | 14 | 1984-02-21 00:00:00
8 | 15 | 15 | 1984-02-21 00:00:00
9 | 14 | 18 | 1984-02-21 00:00:00
10 | 200 | 19 | 1984-02-21 00:00:00
Forgive what may be a silly question, but I'm not much of a database guru.
Is it possible to directly enter a SQL query that will calculate linear regression per station for each date, knowing that the regression must be calculate only with actual id date, previous id date and next id date ?
For example linear regression for id 2 must be calculate with value 7(actual),5(previous),8(next) for dates 1984-02-17 , 1984-02-11 and 1984-02-21
Edit : I have to use regr_intercept(value,udate) but i really don't know how to do this if i have to use only actual, previous and next value/date for each lines.
Edit2 : 3 rows added to idstation(12); id and dates numbers are changed
Hope you can help me, thank you !
This is the combination of Joop's statistics and Denis's window functions:
WITH num AS (
SELECT id, idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
-- id + the ids of the {prev,next} records
-- within the same idstation group
, drag AS (
SELECT id AS center
, LAG(id) OVER www AS prev
, LEAD(id) OVER www AS next
FROM thedata
WINDOW www AS (partition by idstation ORDER BY id)
)
-- junction CTE between ID and its three feeders
, tri AS (
SELECT center AS this, center AS that FROM drag
UNION ALL SELECT center AS this , prev AS that FROM drag
UNION ALL SELECT center AS this , next AS that FROM drag
)
SELECT t.this, n.idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM num n
JOIN tri t ON t.that = n.id
GROUP BY t.this, n.idstation
;
Results:
INSERT 0 7
this | idstation | intercept | slope | rsq | avgx | avgy
------+-----------+-------------------+-------------------+-------------------+------------------+------------------
1 | 12 | -46 | 1 | 1 | 52 | 6
2 | 12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
3 | 12 | -10.6666666666667 | 0.333333333333333 | 1 | 54.5 | 7.5
4 | 14 | | | | 51 | 9
5 | 15 | | | | 51 | 15
6 | 18 | | | | 51 | 14
7 | 19 | | | | 51 | 200
(7 rows)
The clustering of the group-of-three can probably be done more elegantly using a rank() or row_number() function, which would also allow larger sliding windows to be used.
DROP SCHEMA zzz CASCADE;
CREATE SCHEMA zzz ;
SET search_path=zzz;
CREATE TABLE thedata
( id INTEGER NOT NULL PRIMARY KEY
, value INTEGER NOT NULL
, idstation INTEGER NOT NULL
, udate DATE NOT NULL
);
INSERT INTO thedata(id,value,idstation,udate) VALUES
(1 ,5 ,12 ,'1984-02-21' )
,(2 ,7 ,12 ,'1984-02-23' )
,(3 ,8 ,12 ,'1984-02-26' )
,(4 ,9 ,14 ,'1984-02-21' )
,(5 ,15 ,15 ,'1984-02-21' )
,(6 ,14 ,18 ,'1984-02-21' )
,(7 ,200 ,19 ,'1984-02-21' )
;
WITH a AS (
SELECT idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
SELECT idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM a
GROUP BY idstation
;
output:
idstation | intercept | slope | rsq | avgx | avgy
-----------+-------------------+-------------------+-------------------+------------------+------------------
15 | | | | 51 | 15
14 | | | | 51 | 9
19 | | | | 51 | 200
12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
18 | | | | 51 | 14
(5 rows)
Note: if you want a spline-like regression you should also use the lag() and lead() window functions, like in Denis's answer.
If the average is ok for you you could use avg build in... Something like
SELECT avg("value") FROM "my_table" WHERE "idstation" = 3;
Should do. For more complicated things you will need to write some pl/SQL-function I'm afraid or check for a addon on PostgreSQL.
Look into window functions. If I get your question correctly, lead() and lag() will likely give you precisely what you want. Example usage:
select idstation as idstation,
id as curr_id,
udate as curr_date,
lag(id) over w as prev_id,
lag(udate) over w as prev_date,
lead(id) over w as next_id,
lead(udate) over w as next_date
from dates
window w as (
partition by idstation order by udate, id
)
order by idstation, udate, id
http://www.postgresql.org/docs/current/static/tutorial-window.html