How do I calculate cumulative sum for last 7 rows on a specific date in Postgresql? - postgresql

I have a table that has these columns: user_id, day, valueA, valueB.
I'd like to calculate the running sum of last 7 rows of valueA and valueB for each user that has data on a specific day, for example '2020-08-01'.
(Note: Users only have a row when their valueA and valueB is not zero so there are some dates not in the table.)
I tried this query:
select user_id, day,
sum(valueA) over(partition by user_id rows between 7 preceding and current row) as last_7_A,
sum(valueB) over(partition by user_id rows between 7 preceding and current row) as last_7_B
from table where day='2020-08-01'
But this query doesn't calculate the running sum and returns me the valueA and valueB on date 2020-08-01
I could just calculate on each day and select the date I want but that'll be really inefficient. Any ideas how to add the date constraint and let it just calculate on just one row's last 7 running sum for each user?

As per question:
sum of last 7 rows for each user for a particular date, this might work
select user_id, sum(valueA) "sum of valueA", sum(valueB) "sum of valueB"
from sample_table
where id in (
select id
from sample_table
where day='2020-08-08'
order by id desc limit 7)
group by user_id;

Related

postgresql - cumulative sum group by type and week (multiple columns)

I want to have a cumulative sum but my condition needs to group by multiple columns
table: customer
type
week
id
A
2022-01
abc123
B
2022-01
bcd123
B
2022-02
efg123
A
2022-02
klc123
B
2022-02
mad123
My query now:
SELECT week, type, SUM(cnt) OVER (ORDER BY week)
FROM (SELECT week, type, COUNT(*) AS cnt
FROM customer
GROUP BY week, type) t
ORDER BY 1 ASC
and the results:
week
type
Sum
2022-01
A
1
2022-01
B
1
2022-02
A
1
2022-02
B
1
issue is here, the last row of the result should be Sum=2, but for some reason (idk why) it follow the above.
Is it other ways to solve and calculate cumulative?
Thank you
SELECT week, type,
SUM(cnt) OVER (PARTITION BY week, type
ORDER BY week
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (SELECT week, type, COUNT(*) AS cnt
FROM customer
GROUP BY week, type) t
ORDER BY 1 ASC
Sentence "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" you can use or not, because it's default behavior.

How to sum for previous n number of days for a number of dates in PostgreSQL

I have a list of dates each with a value in Postgresql.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like, will be created as two separate tables:
Output
Any help would be appreciated.
Thanks
Sample data and structure: dbfiddle
For first part of query:
select date,
value,
sum(value) over (
order by to_date(date, 'DD/MM/YYYY')
rows between 4 preceding and current row) as five_day_period
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;
For second part of query:
select date,
value,
sum(value)
over (
partition by regexp_replace(date, '[0-9]{2}/(.+)', '\1')
order by to_date(date, 'DD/MM/YYYY')
rows between unbounded preceding and current row) as month_to_date
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month. This calcualtion is to be done for 12 months in a single query. Output should be as below.
Month Count
01/07/2019 50
01/08/2019 34
01/09/2019 23
01/10/2019 98
01/11/2019 10
01/12/2019 5
01/01/2020 32
01/02/2020 65
01/03/2020 23
01/04/2020 12
01/05/2020 64
01/06/2020 54
01/07/2020 78
I am able to get the value only for one month. I want to get it for all months in a single query.
This is my current query:
SELECT COUNT(DISTINCT TWO_MONTHS_AGO.USER_ID), TWO_MONTHS_AGO.MONTH AS INVOICE_MONTH
FROM (
SELECT USER_ID, LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
WHERE invoice_amt > 0
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 2)
GROUP BY user_id
) AS TWO_MONTHS_AGO
LEFT JOIN (
SELECT user_id,LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 1)
GROUP BY USER_ID
) AS ONE_MONTH_AGO ON TWO_MONTHS_AGO.USER_ID = ONE_MONTH_AGO.USER_ID
WHERE ONE_MONTH_AGO.USER_ID IS NULL
GROUP BY INVOICE_MONTH;
Thank you in advance.
Lona
Probably lots of different approaches but the way I would do it is as follows:
Summarise data by user and month for the last 13 months (you need 12 months plus the previous month to that first month
Compare "this" month (that has data) to "next" month and select records where there is no "next" month data
Summarise this dataset by month and distinct userid
For example, assuming a table created as follows:
create table INVOICE_DATA (
USERID varchar(4),
INVOICE_DT date,
INVOICE_AMT NUMBER(10,2)
);
the following query should give you what you want - you may need to adjust it depending on whether you are including this month, or only up to the end of last month, in your calculation, etc.:
--Summarise data by user and month
WITH MONTH_SUMMARY AS
(
SELECT USERID
,TO_CHAR(INVOICE_DT,'YYYY-MM') "INVOICE_MONTH"
,TO_CHAR(ADD_MONTHS(INVOICE_DT,1),'YYYY-MM') "NEXT_MONTH"
,SUM(INVOICE_AMT) "MONTHLY_TOTAL"
FROM INVOICE_DATA
WHERE INVOICE_DT >= TRUNC(ADD_MONTHS(current_date(),-13),'MONTH') -- Last 13 months of data
GROUP BY 1,2,3
),
--Get data for users with invoices in this month but not the next month
USER_DATA AS
(
SELECT USERID, INVOICE_MONTH, MONTHLY_TOTAL
FROM MONTH_SUMMARY MS_THIS
WHERE NOT EXISTS
(
SELECT USERID
FROM MONTH_SUMMARY MS_NEXT
WHERE
MS_THIS.USERID = MS_NEXT.USERID AND
MS_THIS.NEXT_MONTH = MS_NEXT.INVOICE_MONTH
)
AND MS_THIS.INVOICE_MONTH < TO_CHAR(current_date(),'YYYY-MM') -- Don't include this month as obviously no next month to compare to
)
SELECT INVOICE_MONTH, COUNT(DISTINCT USERID) "USER_COUNT"
FROM USER_DATA
GROUP BY INVOICE_MONTH
ORDER BY INVOICE_MONTH
;

Get distinct rows based on one column with T-SQL

I have a column in the following format:
Time Value
17:27 2
17:27 3
I want to get the distinct rows based on one column: Time. So my expected result would be one result. Either 17:27 3 or 17:27 3.
Distinct
T-SQL uses distinct on multiple columns instead of one. Distinct would return two rows since the combinations of Time and Value are unique (see below).
select distinct [Time], * from SAPQMDATA
would return
Time Value
17:27 2
17:27 3
instead of
Time Value
17:27 2
Group by
Also group by does not appear to work
select * from table group by [Time]
Will result in:
Column 'Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Questions
How can I select all unique 'Time' columns without taking into account other columns provided in a select query?
How can I remove duplicate entries?
This is where ROW_NUMBER will be your best friend. Using this as your sample data...
time value
-------------------- -----------
17:27 2
17:27 3
11:36 9
15:14 5
15:14 6
.. below are two solutions with that you can copy/paste/run.
DECLARE #youtable TABLE ([time] VARCHAR(20), [value] INT);
INSERT #youtable VALUES ('17:27',2),('17:27',3),('11:36',9),('15:14',5),('15:14',6);
-- The most elegant way solve this
SELECT TOP (1) WITH TIES t.[time], t.[value]
FROM #youtable AS t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL));
-- A more efficient way solve this
SELECT t.[time], t.[value]
FROM
(
SELECT t.[time], t.[value], ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL)) AS RN
FROM #youtable AS t
) AS t
WHERE t.RN = 1;
Each returns:
time value
-------------------- -----------
11:36 9
15:14 5
17:27 2

SQL Find Records that came for for first time with Value 5 with in last week

I have a table similar to this
personId Value createdTime
1991126 19.00 2018-08-05
1991126 16.00 2018-06-15
1991126 18.00 2018-08-06
1991206 32.00 2018-08-02
1991431 6.00 2018-08-06
1991431 7.00 2018-08-07
I am trying to find personId's that came since last week for first time with value 5
Here I should show 1991206,1991431 because 1991126, has already 16.00 on 2018-06-15
So the personID should not have history with of >5. So we have to compare previous records.
I tried
Select distinct personId,Value,createdTime
where value>=5 and createdtime>= Dateadd(Day,-7,Getdate())
First find all personId's that came since last week (having min(createdTime) >= Cast(Dateadd(Day,-7,Getdate()) as date)) and then check if they have a value higher than 5 (max(myvalue) > 5):
SELECT personID
FROM YourTable
GROUP BY personID
HAVING min(createdTime) >= Cast(Dateadd(Day,-7,Getdate()) AS date) AND max(Value) > 5
You have to first restrict the records to Value > 5, then apply a ranking, and in the end find the records with ranking 1 that are from since last week. Restricting the values and applying the ranking can be done in just one query, but criteria for the ranking must be applied in an outer query. Also, the criteria for the date must not be applied before ranking, because that would rank only the records from last week. I prefer to use common table expressions for nesting queries:
WITH
Greater5 (personID, Value, createdTime, rnk) AS (
SELECT personID, Value, createdTime,
RANK() OVER (PARTITION BY personID ORDER BY createdTime)
FROM YourTable
WHERE Value > 5
)
SELECT personID, Value, createdTime
FROM Greater5
WHERE rnk = 1
AND createdTime >= Dateadd(Day,-7,Getdate())