How to sum for previous n number of days for a number of dates in PostgreSQL - postgresql

I have a list of dates each with a value in Postgresql.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like, will be created as two separate tables:
Output
Any help would be appreciated.
Thanks

Sample data and structure: dbfiddle
For first part of query:
select date,
value,
sum(value) over (
order by to_date(date, 'DD/MM/YYYY')
rows between 4 preceding and current row) as five_day_period
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;
For second part of query:
select date,
value,
sum(value)
over (
partition by regexp_replace(date, '[0-9]{2}/(.+)', '\1')
order by to_date(date, 'DD/MM/YYYY')
rows between unbounded preceding and current row) as month_to_date
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;

Related

How do I combine cast and date max in Redshift?

'date' is in timestamp format and has duplicates for the same event id in my event table
if I am doing a subquery getting a max(date) in order to remove duplicate, can I also cast this max as a date instead of timestamp and wrap that in max? hoping to avoid doing unnecessary subqueries thank you.
Of course you can transform column (by casting or whatever you like) before applying an aggregate function to it. I am not sure how you remove duplicates. Have a look at some example transformations using GROUP BY
with input (dt, v) AS (
SELECT '2020-12-20T12:00'::timestamp, 10 UNION ALL
SELECT '2020-12-20T13:00'::timestamp, 20 UNION ALL
SELECT '2020-12-20T14:00'::timestamp, 30
)
select
dt::date,
max(dt),
max(dt::date),
max(date_trunc('month', dt)),
max(last_day(dt)::timestamp),
avg(v),
count(*)
from input
group by 1
dt
max
max
max
max
avg
count
2020-12-20
2020-12-20 14:00:00.000000
2020-12-20
2020-12-01 00:00:00.000000
2020-12-31 00:00:00.000000
20
3

How do I calculate cumulative sum for last 7 rows on a specific date in Postgresql?

I have a table that has these columns: user_id, day, valueA, valueB.
I'd like to calculate the running sum of last 7 rows of valueA and valueB for each user that has data on a specific day, for example '2020-08-01'.
(Note: Users only have a row when their valueA and valueB is not zero so there are some dates not in the table.)
I tried this query:
select user_id, day,
sum(valueA) over(partition by user_id rows between 7 preceding and current row) as last_7_A,
sum(valueB) over(partition by user_id rows between 7 preceding and current row) as last_7_B
from table where day='2020-08-01'
But this query doesn't calculate the running sum and returns me the valueA and valueB on date 2020-08-01
I could just calculate on each day and select the date I want but that'll be really inefficient. Any ideas how to add the date constraint and let it just calculate on just one row's last 7 running sum for each user?
As per question:
sum of last 7 rows for each user for a particular date, this might work
select user_id, sum(valueA) "sum of valueA", sum(valueB) "sum of valueB"
from sample_table
where id in (
select id
from sample_table
where day='2020-08-08'
order by id desc limit 7)
group by user_id;

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

BigQuery - DATE_TRUNC error

trying to get the monthly aggregated data from Legacy table. Meaning date columns are strings:
amount date_create
100 2018-01-05
200 2018-02-03
300 2018-01-22
However, the command
Select DATE_TRUNC(DATE date_create, MONTH) as month,
sum(amount) as amount_m
from table
group by 1
Returns the following error:
Error: Syntax error: Expected ")" but got identifier "date_create"
Why does this query not run and what can be done to avoid the issue?
Thanks
It looks like you meant to cast date_create instead of using the DATE keyword (which is how you construct a literal value) there. Try this instead:
Select DATE_TRUNC(DATE(date_create), MONTH) as month,
sum(amount) as amount_m
from table
GROUP BY 1
I figured it out:
date_trunc(cast(date_create as date), MONTH) as Month
Another option for BigQuery Standard SQL - using PARSE_DATE function
#standardSQL
WITH `project.dataset.table` AS (
SELECT 100 amount, '2018-01-05' date_create UNION ALL
SELECT 200, '2018-02-03' UNION ALL
SELECT 300, '2018-01-22'
)
SELECT
DATE_TRUNC(PARSE_DATE('%Y-%m-%d', date_create), MONTH) AS month,
SUM(amount) AS amount_m
FROM `project.dataset.table`
GROUP BY 1
with result as
Row month amount_m
1 2018-01-01 400
2 2018-02-01 200
In practice - I prefer PARSE_DATE over CAST as former kind of documents expectation about data format
Try to add double quote to date_creat :
Select DATE_TRUNC('date_create', MONTH) as month,
sum(amount) as amount_m
from table
group by 1

Customize query of postgresql

I am using postgresql database, for i am trying to achieve like i have two queries and but i don't want to use multiple queries so is it possible to manage by single query ?
Query 1 :
select coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-09 00:00:00'::timestamp,'2014-09-09 23:59:59','1 minute')
minutes(minute) LEFT JOIN report ON
minutes.minute=date_trunc('minute', report.fetchdate)
AND fetchdate >= '2014-09-09 00:00:00' AND fetchdate <= '2014-09-09 23:59:00'
AND entity_id ='0' group by minute order by minute
OUTPUT:
Total count of dummy field for each minutes of each day it means each day have total (24*60=1440) records
Note : This Query Using for single Day
Query2 :
select date(day)as day,coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-06 00:00:01'::date,'2014-09-12 23:59:59'::date,'1 day'::interval) days(day) LEFT JOIN report ON days.day=date_trunc('day', report.fetchdate) AND entity_id ='0' group by day order by day
OUTPUT:
give total count of dummy field for each day between day 2014-09-06 to 2014-09-12 it means total 7 records (Date : 6,7,8,9,10,11,12)
Note :This Query using for more than 1 days
Required Output:
1) Need to see total count of dummy field of each day between specified date(Output of 2nd query)
2) Need to see maximum call of each day
Ex :
Suppose i am search by any two days then need to break in single date and get data for each minute of each date and whenever we have maximum count of dummy field of particular day then need to show as output maximum call for each day
select
date_trunc('day', minute) as day,
sum(minute_sum) as day_sum,
max(minute_sum) as max_minute_sum
from (
select
minute,
coalesce(sum("dummy"),0) as minute_sum
from
generate_series(
'2014-09-06'::timestamp,
'2014-09-13'::timestamp - interval '1 minute',
'1 minute'
) minutes(minute)
left join
report on
minutes.minute = date_trunc('minute', report.fetchdate)
and entity_id ='0'
group by minute
) s
group by 1
order by 1