Recursive CTE in Postgres - SUM at each parent node - postgresql

I`m storing a hierarchical product adjacency list and a separate table for all the sales on those products.
Currently, I`m trying to present to the user a "Sales Report" with the total amount/sum sold per each product, and at the parent level.
In the sales table, I do not have info on the sales per group, thus, I have info available only at the level of the child. From what I have read I need to use recursive CTE, and I tried creating some queries, without any success.
Example of my dataset:
F - folders
P - products
Products table:
id
name
parentid
1
F1
NULL
2
F2
1
3
P1
2
4
P2
2
Sales table:
id
quant
sum
3
3
90
4
2
100
What I need to obtain in the report:
id
name
parentid
quant
sum
1
F1
NULL
5
190
2
F2
1
5
190
3
P1
2
3
90
4
P2
2
2
100
Logically, I understand that I need to fetch each row, and recursively go through all its children in order to SUM the quant and the sum, however I have no clue how to write it.
I`d be thankful for any guidance on where I could read more about recursive CTE, or anything that can help my situation.
Cheers!

Related

Number of Order From Customers

This seems like a simple question but I'm having trouble building my graph. I'm trying to get the number of customers who made 1 order, 2 orders, 3 orders etc..
Sample Data Source:
Customer ID| Order ID| Date Ordered
A 10 06/01/2019
A 11 06/02/2019
A 12 06/02/2019
B 15 06/05/2019
B 16 06/05/2019
B 17 06/05/2019
C 20 06/06/2019
C 21 06/06/2019
I can easily get the graph to show that Customer A made 3 Orders , Customer B made 3 Orders and Customer C made 2 orders.. etc.
What I'm trying to show is how many customer places a certain number of orders . So in our sample data. 1 Order = 0 , 2 Orders = 1, 3 Orders = 2. So in the X axis im trying to show (1 Order, 2 Orders , 3 Orders, 4 Orders.. etc )
I tried doing calculations such as IF COUNT([CustomerID]) > 2 then '1 order' but I can't seem to get it right. Any advice will be helpful. Thanks in advance
Maybe you can try using LOD expressions and create a new calculated field like this:
{INCLUDE [Customer ID]: COUNTD([Order ID])}
And then use that field to show that info.

SPSS - Create dummy for top volume months within customer grouping

I need to create a dummy for the top purchase months within each customer ID. That is, if a month belong to one of the four months within the year where the customer purchased the most then it is noted with the number 1, otherwise 0.
Example of data, cust id, order date, volume and new variable dummy:
This code creates some sample data:
data list free/ID volume (2f4).
begin data
1 100 1 500 2 1 2 2 2 3 2 90 1 600 1 90 1 870 2 9 2 8 2 10
end data.
Using the sample data in the question, this code will create a new variable containing the dummy according to your definition:
RANK VARIABLES=volume (A) BY ID /RANK.
compute high4=(Rvolume<=4).

"Inserting" Records into Fields from a Database Feed

So the background to this is I'm trying to create a survival curve based on a database feed from the directions here.
What I have so far is three calculated fields per below. Patient ID is not a calculated field or necessary for the survival analysis, but I believe it could be useful for this question. For reference, there are about 20,000 unique patients.
Patient ID | Time | Censor | Group
Id1 3 0 1
Id2 8 0 2
Id3 1 1 1
Id4 3 1 1
Id5 11 0 1
Id5 7 1 2
What I would like to do is insert two records (one for each group) such:
Patient ID | Time | Censor | Group | Link
0 1
0 2
Id1 3 0 1 link
Id2 8 0 2 link
Id3 1 1 1 link
Id4 3 1 1 link
Id5 11 0 1 link
Id5 7 1 2 link
I unsuccessfully tried to create an excel spreadsheet with these base attributes to union with the columns, however, an excel spreadsheet does not appear to be able to union with a database.
My next idea is to find 2 of the 20,000 patients where I can create a calculated field along these lines (not sure this is feasible in Tableau, please excuse my syntax):
IF [Patient ID] = Id3 THEN [TIME] = 0 AND [CENSOR] IS NULL
END
and then a [Link] calculated formula:
IF [Patient ID] = Id3 THEN NULL
ELSE "link"
END
Any help would be appreciated. Would like to avoid inserting these records in the database.
The best / easiest option is to use an outer join to your excel workbook -- this is a new feature in Tableau version 10 (Cross database joins)
Then, once the dataset is combined, you can build business logic through a filter or calculated field based on the absence or presence of the Excel data.
http://www.tableau.com/about/blog/2016/7/integrate-your-data-cross-database-joins-56724

SQL Sum and Group By for a running Tally?

I'm completely rewriting my question to simplify it. Sorry if you read the prior version. (The previous version of this question included a very complex query example that created a distraction from what I really need.) I'm using SQL Express.
I have a table of lessons.
LessonID StudentID StudentName LengthInMinutes
1 1 Chuck 120
2 2 George 60
3 2 George 30
4 1 Chuck 60
5 1 Chuck 10
These would be ordered by date. (Of course the actual table is thousands of records with dates and other lesson-related data but this is a simplification.)
I need to query this table such that I get all rows (or a subset of rows by a date range or by student), but I need my query to add a new column we might call PriorLessonMinutes. That is, the sum of all minutes of all lessons for the same student in lessons of PRIOR dates only.
So the query would return:
LessonID StudentID StudentName LengthInMinutes PriorLessonMinutes
1 1 Chuck 120 0
2 2 George 60 0
3 2 George 30 60 (The sum Length from row 2 only)
4 1 Chuck 60 120 (The sum Length from row 1 only)
5 1 Chuck 10 180 (The sum of Length from rows 1 and 4)
In essence, I need a running tally of the sum of prior lesson minutes for each student. Ideally the tally shouldn't include the current row, but if it does, no big deal as I can do subtraction in the code that receives the query.
Further, (and this is important) if I retrieve only a subset of records, (for example by a date range) PriorLessonMinutes must be a sum that considers rows that are NOT returned.
My first idea was to use SUM() and to GROUP BY Student, but that isn't right because unless I'm mistaken it would include a sum of minutes for all rows for each student, including rows that come after the row which aren't relevant to the sum I need.
OPTIONS I'M REJECTING: I could scan through all rows in my code that receives it, (although this would force me to retrieve all rows unnecessarily) but that's obviously inefficient. I could also put a real data field in there and populate it, but this too presents problems when other records are deleted or altered.
I have no idea how to write such a query together. Any guidance?
This is a great opportunity to use Windowed Aggregates. The trick is that you need SQL Server 2012 Express. If you can get it, then this is the query you are looking for:
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
Note that it returns NULLs instead of 0s (zeroes). If you insist on zeroes, use COALESCE function to turn NULLs into zeroes.
I suggest using a nested query to limit the number of rows returned:
select * from
(
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
) as NestedLessons
where LessonId > 3 -- this is an example of a filter
This way the filter is applied after the aggregation is complete.
Now, if you want to apply a filter that doesn't affect the aggregation (like only querying data for a certain student), you should apply the filter to the inner query, as pruning the rows that don't affect the computation early (like data for other students) will improve the performance.
I feel the following code will serve your purpose.Check it:-
select Students.StudentID ,Students.First, Students.Last,sum(Lessons.LengthInMinutes)
as TotalPriorMinutes from lessons,students
where Lessons.StartDateTime < getdate()
and Lessons.StudentID = Students.StudentID
and StartDateTime >= '20090130 00:00:00' and StartDateTime < '20790101 00:00:00'
group by Students.StudentID ,Students.First, Students.Last

Calculating change in leaders for baseball stats in MSSQL

Imagine I have a MSSQL 2005 table(bbstats) that updates weekly showing
various cumulative categories of baseball accomplishments for a team
week 1
Player H SO HR
Sammy 7 11 2
Ted 14 3 0
Arthur 2 15 0
Zach 9 14 3
week 2
Player H SO HR
Sammy 12 16 4
Ted 21 7 1
Arthur 3 18 0
Zach 12 18 3
I wish to highlight textually where there has been a change in leader for each category
so after week 2 there would be nothing to report on hits(H); Zach has joined Arthur with most strikeouts(SO) at
18; and Sammy is new leader in homeruns(HR) with 4
So I would want to set up a process something like
a) save the past data(week 1) as table bbstatsPrior,
b) updates the bbstats for the new results - I do not need assistance with this
c) compare between the tables for the player(s with ties) with max value for each column
and spits out only where they differ
d) move onto next column and repeat
In any real world example there would be significantly more columns to calculate for
Thanks
Responding to Brents comments, I am really after any changes in the leaders for each category
So I would have something like
select top 1 with ties player
from bbstatsPrior
order by H desc
and
select top 1 with ties player,H
from bbstats
order by H desc
I then want to compare the player from each query (do I need to do temp tables) . If they differ I want to output the second select statement. For the H category Ted is leader `from both tables but for other categories there are changes between the weeks
I can then loop through the columns using
select name from sys.all_columns sc
where sc.object_id=object_id('bbstats') and name <>'player'
If the number of stats doesn't change often, you could easily just write a single query to get this data. Join bbStats to bbStatsPrior where bbstatsprior.week < bbstats.week and bbstats.week=#weekNumber. Then just do a simple comparison between bbstats.Hits to bbstatsPrior.Hits to get your difference.
If the stats change often, you could use dynamic SQL to do this for all columns that match a certain pattern or are in a list of columns based on sys.columns for that table?
You could add a column for each stat column to designate the leader using a correlated subquery to find the max value for that column and see if it's equal to the current record.
This might get you started, but I'd recommend posting what you currently have to achieve this and the community can help you from there.