Rolling N monthly average in Redshift with multiple entries per month - amazon-redshift

I would like to use Redshift's window aggregation functions to create an 'N' month rolling average of some data. The data will have multiple unique entries per any given month. If possible, I'd like to avoid first grouping by and averaging within months before performing rolling average as this is taking an average of an average and not ideal (as this post does: 3 Month Moving Average - Redshift SQL).
This is a sample dataset of just one account (there will be more than 1).
Quote Date Account. Value
3/24/2015 acme. 3
3/25/2015 acme. 7
4/1/2015 acme. 12
4/3/2015 acme. 17
5/15/2015 acme. 1
6/30/2015 acme. 3
7/30/2015 acme. 9
And this is what I would like the result to look like for a 3 month rolling average (for an example).
Quote_Date Account. Value Month 3M_Rolling_Average
3/24/2015 acme. 3 1 3
3/25/2015 acme. 7 1 5
4/1/2015 acme. 12 2 7.33
4/3/2015 acme. 17 2 9.75
5/15/2015 acme. 1 3 8
6/30/2015 acme. 3 4 8.25
7/30/2015 acme. 9 5 4.33
The code I have tried looks like this:
avg(Value) over (partition by Account order by Quote Date rows between 2 preceding and current row)
But, this only operates over the last 2 rows (and including current row) which would work if I had one unique value for each month but as stated, this is not the case. I am open to any kind of ranking solution or nested partitioning. Any help is greatly appreciated.

Since an average is just the sum() / count(), you just need to group by month but get the sum() and count(). Then use your lag to sum 3 months of sums and divide by the sum of 3 months of counts. You are correct that average of averages is not correct but if you carry the sums and counts things work.

Related

Period To Date Median Over Time

I'm trying to calculate in Quicksight the median over time on a quarterly bases, so given these values:
Value
Date
1
01/01/2023
2
01/02/2023
3
01/02/2023
4
01/02/2023
5
01/03/2023
6
01/04/2023
7
01/04/2023
8
01/05/2023
Ideally I would like to get these values:
Value
Date
1
01/2023
2.5
02/2023
3
03/2023
6.5
04/2023
7
05/2023
There are some functions in Quicksight (like periodToDateAvgOverTime()) that perform this aggregation for other functions. Is there a way to calculate this with a custom formula or some work around?
You could use periodToDateAvgOverTime(median(value),....), or use the mediaIf(value,condition) and use the date as condition

How to add an iterative id column which goes up when a value in another column resets to 1 in Postgresql

I have a SQL table which has two columns called seq and sub_seq as seen below. I would like to add a third column called id, which goes up by 1 every time the sub_seq starts again at 1 as shown in the table below.
seq
sub_seq
id
1
1
1
2
2
1
3
3
1
4
4
1
5
5
1
6
1
2
7
2
2
8
3
2
9
1
3
10
2
3
11
3
3
12
4
3
13
5
3
14
6
3
15
7
3
I could write a solution using plpgsql, however I would like to know if there is a way of doing this in standard SQL. Any help would be greatly appreciated.
If sub_seq is always a running sequence then you can use the DENSE RANK function and order over the differences of two columns, assuming it will consistently uniform.
SELECT seq, sub_Seq, DENSE_RANK() OVER (ORDER BY seq-sub_Seq) AS id
FROM tableDemo
This solution is based on the sample data you have provided, I think more sample data would be helpful to check the whole scenario.

How to average monthly data using a specific method in Matlab?

I have the following vector of monthly values (vectorA). I put the date related info next to it to help illustrate the task but I work with just the vector itself
dates month_in_q vectorA
31/01/2020 1 10
29/02/2020 2 15
31/03/2020 3 6
30/04/2020 1 8
31/05/2020 2 4
30/06/2020 3 3
How can I create a new vectorNEW according to this algorithm
In each quarter the first month is the original first month
In each quarter the second month is the average of first and second month
In each quarter the third month is the average of all three months
So that I get the following vectorNEW by manipulating the original vectorA in a loop given this the re-occuring pattern above
dates month_in_q vectorA vectorNEW
31/01/2020 1 10 10
29/02/2020 2 15 AVG(10+15)
31/03/2020 3 6 AVG(10+15+6)
30/04/2020 1 8 8
31/05/2020 2 4 AVG(8+4)
30/06/2020 3 3 AVG(8+4+3)
... ... ... ...
An elegant solution was provided by the user dpb on mathworks website.
vectorNEW=reshape(cumsum(reshape(vectorA,3,[]))./[1:3].',[],1);
Further info below
https://uk.mathworks.com/matlabcentral/answers/823055-how-to-average-monthly-data-using-a-specific-method-in-matlab

Calculating group means with own group excluded in MATLAB

To be generic the issue is: I need to create group means that exclude own group observations before calculating the mean.
As an example: let's say I have firms, products and product characteristics. Each firm (f=1,...,F) produces several products (i=1,...,I). I would like to create a group mean for a certain characteristic of the product i of firm f, using all products of all firms, excluding firm f product observations.
So I could have a dataset like this:
firm prod width
1 1 30
1 2 10
1 3 20
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
To reproduce the table:
firm=[1,1,1,2,2,2,3,3]
prod=[1,2,3,1,2,4,2,4]
hp=[30,10,20,25,15,40,10,35]
x=[firm' prod' hp']
Then I want to estimate a mean which will use values of all products of all other firms, that is excluding all firm 1 products. In this case, my grouping is at the firm level. (This mean is to be used as an instrumental variable for the width of all products in firm 1.)
So, the mean that I should find is: (25+15+40+10+35)/5=25
Then repeat the process for other firms.
firm prod width mean_desired
1 1 30 25
1 2 10 25
1 3 20 25
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
I guess my biggest difficulty is to exclude the own firm values.
This question is related to this page here: Calculating group mean/medians in MATLAB where group ID is in a separate column. But here, we do not exclude the own group.
p.s.: just out of curiosity if anyone works in economics, I am actually trying to construct Hausman or BLP instruments.
Here's a way that avoids loops, but may be memory-expensive. Let x denote your three-column data matrix.
m = bsxfun(#ne, x(:,1).', unique(x(:,1))); % or m = ~sparse(x(:,1), 1:size(x,1), true);
result = m*x(:,3);
result = result./sum(m,2);
This creates a zero-one matrix m such that each row of m multiplied by the width column of x (second line of code) gives the sum of other groups. m is built by comparing each entry in the firm column of x with the unique values of that column (first line). Then, dividing by the respective count of other groups (third line) gives the desired result.
If you need the results repeated as per the original firm column, use result(x(:,1))

Calculating MAX(DATE) for Value Groups Where Values Go Back and Forth

I have another challenge that I am trying to resolve but unable to get the solution yet. Here is the scenario. Pardon the formatting if it messes up at the time of posting.
ACCT_NUM CERT_ID Code Date Desired Output
A 1 10 1/1/2007 1/1/2008
A 1 10 1/1/2008 1/1/2008
A 1 20 1/1/2009 1/1/2010
A 1 20 1/1/2010 1/1/2010
A 1 10 1/1/2011 1/1/2012
A 1 10 1/1/2012 1/1/2012
A 2 20 1/1/2007 1/1/2008
A 2 20 1/1/2008 1/1/2008
A 2 10 1/1/2009 1/1/2010
A 2 10 1/1/2010 1/1/2010
A 2 30 1/1/2011 1/1/2011
A 2 10 1/1/2012 1/1/2013
A 2 10 1/1/2013 1/1/2013
As you can see, I need to do a MAX on the date based on each group of code values (apart from ACCT_NUM and CERT_ID) before the value changes. If the same value repeats, I need to a MAX of the data again for that group separately. For example, for CERT_ID of '1', I cannot group all four rows of Code 10 to get a MAX date of 1/1/2012. I need to get the MAX for the first two rows and then another MAX for the next two rows separately since there is another code in between. I am trying to accomplish this in Cognos Framework Manager.
Gurus, please advise.
The syntax for getting the max value for CERT_ID is:
maximum(Date for CERT_ID)
If you want additional level/s for max you can use the following syntax:
maximum(Date for ACCT_NUM,CERT_ID,Code)
In general, it is best practice to group and summarize values in report, not in framework manager.