Aggregate parents recursively in PostgreSQL - postgresql

In a child-parent table, I need to aggregate all parents for each child. I can readily get children per parent in a CTE query, but can't figure how to reverse it (sqfiddle here). Given this:
CREATE TABLE rel(
child integer,
parent integer
);
INSERT INTO rel(child, parent)
VALUES
(1,NULL),
(2,1),
(3,1),
(4,3),
(5,2),
(6,4),
(7,2),
(8,7),
(9,8);
a query that will return an array of parents (order is not important):
1, {NULL}
2, {1}
3, {1}
4, {3,1}
5, {2,1}
6, {4,3,1}
7, {2,1}
8, {7,2,1}
9, {8,7,2,1}

Even if there is an accepted answer, I would like to show how the problem can be solved in pure SQL in a much simpler way, with a recursive CTE:
WITH RECURSIVE t(child, parentlist) AS (
SELECT child , ARRAY[]::INTEGER[] FROM rel WHERE parent IS NULL
UNION
SELECT rel.child, rel.parent || t.parentlist
FROM rel
JOIN t ON rel.parent = t.child
) SELECT * FROM t;
child | parentlist
-------+------------
1 | {}
2 | {1}
3 | {1}
4 | {3,1}
5 | {2,1}
7 | {2,1}
6 | {4,3,1}
8 | {7,2,1}
9 | {8,7,2,1}
(9 rows)
If you insist on having a singleton {NULL} for children with an empty list of parents, just say
SELECT child,
CASE WHEN CARDINALITY(parentlist) = 0
THEN ARRAY[NULL]::INTEGER[]
ELSE parentlist
END
FROM t;
instead of SELECT * FROM t, but frankly, I don’t see why you should.
A final remark: I am not aware of any efficient way to do this with relational databases, either in pure SQL or with procedural languages. The point is that JOIN’s are inherently expensive, and if you have really large tables, your queries will take lots of time. You can mitigate the problem with indexes, but the best way to tackle this kind of problems is by using graphing software and not RDBMS.

For this you *can create a PL. I did something similar, here is my PL that handles any father-son structure, it returned a table, but for your case I changed a little bit:
DROP FUNCTION IF EXISTS ancestors(text,integer,integer);
CREATE OR REPLACE FUNCTION ancestors(
table_name text,
son_id integer,-- the id of the son you want its ancestors
ancestors integer)-- how many ancestors you want. 0 for every ancestor.
RETURNS integer[]
AS $$
DECLARE
ancestors_list integer[];
father_id integer:=0;
query text;
row integer:=0;
BEGIN
LOOP
query:='SELECT child, parent FROM '||quote_ident(table_name) || ' WHERE child='||son_id;
EXECUTE query INTO son_id,father_id;
RAISE NOTICE 'son:% | father: %',son_id,father_id;
IF son_id IS NOT NULL
THEN
ancestors_list:=array_append(ancestors_list,father_id);
son_id:=father_id;
ELSE
ancestors:=0;
father_id:=0;
END IF;
IF ancestors=0
THEN
EXIT WHEN father_id IS NULL;
ELSE
row:=row+1;
EXIT WHEN ancestors<=row;
END IF;
END LOOP;
RETURN ancestors_list;
END;
$$ LANGUAGE plpgsql;
Once the PL is created, to get wat you want just query:
SELECT *,ancestors('rel',child,0) from rel
This returns:
child | parent | ancestors
------+--------+-----------------
1 | NULL | {NULL}
2 | 1 | {1,NULL}
3 | 1 | {1,NULL}
4 | 3 | {3,1,NULL}
5 | 2 | {2,1,NULL}
6 | 4 | {4,3,1,NULL}
7 | 2 | {2,1,NULL}
8 | 7 | {7,2,1,NULL}
9 | 8 | {8,7,2,1,NULL}
If you don't want the NULL to appear, just update the PL ;)

Related

SELECT MAX subquery not allowed in WHERE clause when using WITH RECURSIVE in Postgres

This LeetCode problem with given schema
CREATE TABLE IF NOT EXISTS
Tasks (task_id int, subtasks_count int);
TRUNCATE TABLE Tasks;
INSERT INTO
Tasks (task_id, subtasks_count)
VALUES
('1', '3'),
('2', '2'),
('3', '4');
CREATE TABLE IF NOT EXISTS
Executed (task_id int, subtask_id int);
TRUNCATE TABLE Executed;
INSERT INTO
Executed (task_id, subtask_id)
VALUES
('1', '2'),
('3', '1'),
('3', '2'),
('3', '3'),
('3', '4');
has the following as a possible solution when using MySQL version 8.0.23:
WITH RECURSIVE possible_tasks_subtasks AS (
SELECT
task_id, subtasks_count as max_subtask_count, 1 AS subtask_id
FROM
Tasks
UNION ALL
SELECT
task_id, max_subtask_count, subtask_id + 1
FROM
possible_tasks_subtasks
---> using SELECT MAX below is where the problem occurs with Postgres
WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks))
SELECT
P.task_id, P.subtask_id
FROM
possible_tasks_subtasks P
LEFT JOIN
Executed E ON P.task_id = E.task_id AND P.subtask_id = E.subtask_id
WHERE
E.task_id IS NULL OR E.subtask_id IS NULL;
When trying this out with Postgres 13.1, I get the following error:
ERROR: aggregate functions are not allowed in WHERE
This struck me as odd given that a seemingly similar solution (in terms of using SELECT <aggregate-function> in the WHERE clause) is offered in the docs for aggregate functions:
SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
If I modify
WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks)
in the solution code block above to be
WHERE
subtask_id < (SELECT max_subtask_count FROM Tasks ORDER BY max_subtask_count DESC LIMIT 1)
then Postgres does not throw an error. As a sanity check, I tried
SELECT * FROM tasks WHERE task_id < (SELECT MAX(subtasks_count) FROM Tasks);
just to make sure I could use SELECT MAX in a subquery for a WHERE clause as the docs suggested, and this worked as expected.
The only determination I can make thus far is that this somehow has to do with how Postgres processes things when using WITH RECURSIVE. But the docs on WITH queries does not say anything about using aggregates in subqueries for WHERE clauses.
What am I missing here? Why does this work in MySQL but not Postgres? But more importantly, why does the solution offered in the docs not seem to work when using WITH RECURSIVE (from my reading and experimenting anyway)?
EDIT: For additional context in terms of the LeetCode problem and what it is asking you to accomplish with your query:
Table: Tasks
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| task_id | int |
| subtasks_count | int |
+----------------+---------+
task_id is the primary key for this table.
Each row in this table indicates that task_id was divided into subtasks_count subtasks labelled from 1 to subtasks_count.
It is guaranteed that 2 <= subtasks_count <= 20.
Table: Executed
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| task_id | int |
| subtask_id | int |
+---------------+---------+
(task_id, subtask_id) is the primary key for this table.
Each row in this table indicates that for the task task_id, the subtask with ID subtask_id was executed successfully.
It is guaranteed that subtask_id <= subtasks_count for each task_id.
Write an SQL query to report the IDs of the missing subtasks for each task_id. Return the result table in any order. The query result format is in the following example:
Tasks table:
+---------+----------------+
| task_id | subtasks_count |
+---------+----------------+
| 1 | 3 |
| 2 | 2 |
| 3 | 4 |
+---------+----------------+
Executed table:
+---------+------------+
| task_id | subtask_id |
+---------+------------+
| 1 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
+---------+------------+
Result table:
+---------+------------+
| task_id | subtask_id |
+---------+------------+
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
+---------+------------+
You don't need the MAX() to find the subtask count from the tasks table. Just carry that information over from the initial query in the recursive part.
I would also use a NOT EXISTS condition to get this result:
with recursive all_subtasks as (
select task_id, 1 as subtask_id, subtasks_count
from tasks
union all
select t.task_id, p.subtask_id + 1, p.subtasks_count
from tasks t
join all_subtasks p on p.task_id = t.task_id
where p.subtask_id < p.subtasks_count
)
select st.task_id, st.subtask_id
from all_subtasks st
where not exists (select *
from executed e
where e.task_id = st.task_id
and e.subtask_id = st.subtask_id)
order by t.task_id, t.subtask_id;
In Postgres this can be written a bit simpler using generate_series()
select t.task_id, st.subtask_id
from tasks t
cross join generate_series(1, t.subtasks_count) as st(subtask_id)
where not exists (select *
from executed e
where e.task_id = t.task_id
and e.subtask_id = st.subtask_id)
order by t.task_id;
Online example
As to "why isn't an aggregate allowed in the recursive part" - the answer is quite simple: nobody of the Postgres development team though it was important enough to implement it.
Response by Tom Lane:
Reproduced for ease of reading:
As the query is written, the aggregate is over a field of possible_tasks_subtasks, making it illegal in WHERE, just as the error says. (From the point of view of the SELECT FROM Tasks subquery, it's a constant outer reference, not an aggregate of that subquery. This is per SQL spec.)
With this guidance, a successful restated query in PostgreSQL is as follows:
WITH RECURSIVE possible_tasks_subtasks AS (
SELECT
task_id, subtasks_count, 1 AS subtask_id
FROM
Tasks
UNION ALL
SELECT
task_id, subtasks_count, subtask_id + 1
FROM
possible_tasks_subtasks
WHERE
subtask_id < (SELECT MAX(subtasks_count) FROM Tasks))
SELECT
P.task_id, P.subtask_id
FROM
possible_tasks_subtasks P
LEFT JOIN
Executed E ON P.task_id = E.task_id AND P.subtask_id = E.subtask_id
WHERE
E.task_id IS NULL AND P.subtasks_count >= P.subtask_id;

PostgreSQL JSONB grouping array values inside a hash

We have a PostgreSQL jsonb column containing hashes which in turn contain arrays of values:
id | hashes
---------------
1 | {"sources"=>["a","b","c"], "ids"=>[1,2,3]}
2 | {"sources"=>["b","c","d","e","e"], "ids"=>[1,2,3]}
What we'd like to do is create a jsonb query which would return
code | count
---------------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
we've been trying something along the lines of
SELECT jsonb_to_recordset(hashes->>'sources')
but that's not working - any help with this hugely appreciated...
The setup (should be a part of the question, note the proper json syntax):
create table a_table (id int, hashes jsonb);
insert into a_table values
(1, '{"sources":["a","b","c"], "ids":[1,2,3]}'),
(2, '{"sources":["b","c","d","e","e"], "ids":[1,2,3]}');
Use the function jsonb_array_elements():
select code, count(code)
from
a_table,
jsonb_array_elements(hashes->'sources') sources(code)
group by 1
order by 1;
code | count
------+-------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
(5 rows)
SELECT h, count(*)
FROM (
SELECT jsonb_array_elements_text(hashes->'sources') AS h FROM mytable
) sub
GROUP BY h
ORDER BY h;
We finally got this working this way:
SELECT jsonb_array_elements_text(hashes->'sources') as s1,
count(jsonb_array_elements_text(hashes->'sources'))
FROM a_table
GROUP BY s1;
but Klin's solution is more complete and both Klin and Patrick got there quicker than us (thank you both) - so points go to them.

Postgresql Update inside For Loop

I'm new enough to postgresql, and I'm having issues updating a column of null values in a table using a for loop. The table i'm working on is huge so for brevity i'll give a smaller example which should get the point across. Take the following table
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | NULL |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | NULL |
| c | 2 | 3 | NULL |
| d | 4 | 2 | NULL |
+----+----------+----------+
I want to write a for loop which iterates through all of the rows and does some operation
on the values in columns a and b and then inserts a new value in c.
For example, where id = a , update table set C = A*B, or where id = d set C = A + B etc. This would then give me a table like
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | 0 |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | 6 |
| c | 2 | 3 | NULL |
| d | 4 | 2 | 6 |
+----+----------+----------+
So ultimately I'd like to loop through all the rows of the table and update column C according to the value in the "id" column. The function I've written (which isn't giving any errors but also isn't updating anything either) looks like this...
-- DROP FUNCTION some_function();
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
--r integer; not too sure if this needs to be declared or not
result int;
BEGIN
FOR r IN select * from 'table_name'
LOOP
select(
case
when id = 'a' THEN B*C
when id = 'd' THEN B+C
end)
into result;
update table set C = result
WHERE id = '';
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql
I'm sure there's something silly i'm missing, probably around what I'm, returning... void in this case. But as I only want to update existing rows should I need to return anything? There's probably easier ways of doing this than using a loop but I'd like to get it working using this method.
If anyone could point me in the right direction or point out anything blatantly obvious that I'm doing wrong I'd much appreciate it.
Thanks in advance.
No need for a loop or a function, this can be done with a single update statement:
update table_name
set c = case
when id = 'a' then a*b
when id = 'd' then a+b
else c -- don't change anything
end;
SQLFiddle: http://sqlfiddle.com/#!15/b65cb/2
The reason your function isn't doing anything is this:
update table set C = result
WHERE id = '';
You don't have a row with an empty string in the column id. Your function also seems to use the wrong formula: when id = 'a' THEN B*C I guess that should be: then a*b. As C is NULL initially, b*c will also yield null. So even if your update in the loop would find a row, it would update it to NULL.
You are also retrieving the values incorrectly from the cursor.
If you really, really want to do it inefficiently in a loop, the your function should look something like this (not tested!):
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
result int;
BEGIN
-- r is a structure that contains an element for each column in the select list
FOR r IN select * from table_name
LOOP
if r.id = 'a' then
result := r.a * r.b;
end if;
if r.id = 'b' then
result := r.a + r.b;
end if;
update table
set C = result
WHERE id = r.id; -- note the where condition that uses the value from the record variable
END LOOP;
END
$BODY$
LANGUAGE plpgsql
But again: if your table is "huge" as you say, the loop is an extremely bad solution. Relational databases are made to deal with "sets" of data. Row-by-row processing is an anti-pattern that will almost always have bad performance.
Or to put it the other way round: doing set-based operations (like my single update example) is always the better choice.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

T-SQL - Complicated recursion with sum

I apologize for the long problem description, but I was unable to break it down more than this.
Before reading on, keep in mind that my end goal is T-SQL (maybe some recursive CTE?). However, a shove in the right direction would be much appreciated (I've tried a million things and have been scratching my head for hours).
Consider the following problem: I have a table of categories which is self-referencing through ParentCategoryID->CategoryID:
-----------------------
| Category |
-----------------------
| *CategoryID |
| Name |
| ParentCategoryID |
-----------------------
This type of table allows me to build a tree structure, say:
-----------------
| parentCategory|
-----------------
/ | \
child(1) child child
/ \
child(2) child(3)
where "child" means "child category" (ignore the numbers, I'll explain them later). Obviously I can have as many children as I like at any level.
Every day, a program I've written stores values to a table "ValueRegistration", which is connected to "Category" like so:
------------------------ ---------------------- ----------------------
| ValueRegistration | | Item | | Category |
------------------------ ---------------------- ----------------------
| *RegID | | *ItemID | | *CategoryID |
| Date |>-------| CategoryID |>---------| Name |
| ItemID | | ItemTypeID | | ParentCategoryID |
| Value | ---------------------- ----------------------
------------------------ Y
|
|
---------------------
| ItemType |
---------------------
| *ItemTypeID |
| ItemType |
---------------------
As you can see, a ValueRegistration concerns a specific Item, which in turn belongs to a certain category. The category may or may not have a parent (and grandparents and great-grandparents and so on). For instance, it may be the child all the way to the bottom left (number 2) in the tree I illustrated above. Also, an Item is of a certain ItemType.
My goal:
I register values to the ValueRegistration table daily (in other words, Date and ItemID combined is also a primary key). I want to be able to retrieve a resultset on the following form:
[ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID, Value]
which seems simple enough (it's obviously just a bunch of joins). However, I also want results for rows that actually don't exist in the ValueRegistration table, namely results in which the values of sibling nodes for a given date and itemID are summed, and a new row is produced where ValueRegistration.Date and ItemType.ItemTypeID are the same as in the child nodes but where CategoryID is that of the parent of the child nodes. Keep in mind that an Item will NOT exist for this type of row in the resultset.
Consider for instance a scenario where I have ValueRegistrations for child 2 and 3 on a bunch of dates and a bunch of ItemIDs. Obviously, each registration belongs to a certain ItemType and Category. It should be clear to the reader that
ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID
is a sufficient key to identify a specific ValueRegistration (in other words, it's possible to solve my problem without having to create temporary Item rows), and so I can inner join all tables and, for instance, the following result:
ValueReg.Date, ItemType.ItemTypeID, Category.CategoryID, ValueReg.Value
08-mar-2013, 1, 5, 200
08-mar-2013, 1, 6, 250
Assume now that I have four category rows that look like this:
1, category1, NULL
2, category2, 1
5, category5, 2
6, category6, 2
I.e. category 1 is the parent of category 2, and category 2 is the parent of categories 5 and 6. Category 1 has no parent. I now wish to append the following rows to my resultset:
08-mar-2013, 1, 2, (200+250)
08-mar-2013, 1, 1, (200+250+sum(values in all other childnodes of node 1)
Remember:
the solution needs to be recursive, so that it is performed upwards in the tree (until NULL is reached)
an Item row will NOT exist for tree nodes which are calculated, so CategoryID and ItemTypeID must be used
yes, I know I could simply create "virtual" Item rows and add ValueRegistrations when I originally INSERT INTO my database, but that's solution is prone to errors, particularly if other programmers code up against my database but either forget or are unaware that results must be passed up to parent node. A solution that calculates this on request instead is much safer and, frankly, much more elegant.
I've tried to set something up along the lines of this, but I seem to get stuck with having to group by Date and ItemTypeID, and that's not allowed in a CTE. The programmer in me just wants to make a recursive function, but I'm really struggling to do that in SQL.
Anyone have an idea where to begin, what things I should try, or even (fingers crossed) a solution?
Thanks!
Alexander
EDIT:
SQL FIDDLE
CREATE TABLE ItemType(
ItemTypeID INT PRIMARY KEY,
ItemType VARCHAR(50)
);
CREATE TABLE Category(
CategoryID INT PRIMARY KEY,
Name VARCHAR(50),
ParentCategoryID INT,
FOREIGN KEY(ParentCategoryID) REFERENCES Category(CategoryID)
);
CREATE TABLE Item(
ItemID INT PRIMARY KEY,
CategoryID INT NOT NULL,
ItemTypeID INT NOT NULL,
FOREIGN KEY(CategoryID) REFERENCES Category(CategoryID),
FOREIGN KEY(ItemTypeID) REFERENCES ItemType(ItemTypeID)
);
CREATE TABLE ValueRegistration(
RegID INT PRIMARY KEY,
Date DATE NOT NULL,
Value INT NOT NULL,
ItemID INT NOT NULL,
FOREIGN KEY(ItemID) REFERENCES Item(ItemID)
);
INSERT INTO ItemType VALUES(1, 'ItemType1');
INSERT INTO ItemType VALUES(2, 'ItemType2');
INSERT INTO Category VALUES(1, 'Category1', NULL); -- Top parent (1)
INSERT INTO Category VALUES(2, 'Category2', 1); -- A child of 1
INSERT INTO Category VALUES(3, 'Category3', 1); -- A child of 1
INSERT INTO Category VALUES(4, 'Category4', 2); -- A child of 2
INSERT INTO Category VALUES(5, 'Category5', 2); -- A child of 2
INSERT INTO Category VALUES(6, 'Category6', NULL); -- Another top parent
INSERT INTO Item VALUES(1, 4, 1); -- Category 4, ItemType 1
INSERT INTO Item VALUES(2, 5, 1); -- Category 5, ItemType 1
INSERT INTO Item VALUES(3, 3, 1); -- Category 3, ItemType 1
INSERT INTO Item VALUES(4, 1, 2); -- Category 1, ItemType 2
INSERT INTO ValueRegistration VALUES(1, '2013-03-08', 100, 1);
INSERT INTO ValueRegistration VALUES(2, '2013-03-08', 200, 2);
INSERT INTO ValueRegistration VALUES(3, '2013-03-08', 300, 3);
INSERT INTO ValueRegistration VALUES(4, '2013-03-08', 400, 4);
INSERT INTO ValueRegistration VALUES(5, '2013-03-09', 120, 1);
INSERT INTO ValueRegistration VALUES(6, '2013-03-09', 220, 2);
INSERT INTO ValueRegistration VALUES(7, '2013-03-09', 320, 3);
INSERT INTO ValueRegistration VALUES(8, '2013-03-09', 420, 4);
-- -------------------- RESULTSET I WANT ----------------------
-- vr.Date | ItemType | CategoryTypeID | Value
-- ------------------------------------------------------------
-- 2013-03-08 | 'ItemType1' | 'Category4' | 100 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category5' | 200 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category3' | 300 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category2' | 100+200 Calculated tree node
-- 2013-03-08 | 'ItemType1' | 'Category1' | 100+200+300 Calculated tree node
-- 2013-03-08 | 'ItemType2' | 'Category1' | 400 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category4' | 120 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category5' | 220 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category3' | 320 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category2' | 120+220 Calculated tree node
-- 2013-03-09 | 'ItemType1' | 'Category1' | 120+220+320 Calculated tree node
-- 2013-03-09 | 'ItemType2' | 'Category1' | 420 Directly available
If you replace all joins to the table Category with joins to this dynamic relation, you will get the hierarchy you are lookinfg for:
with Category as (
select * from ( values
(1,'Fred',null),
(2,'Joan',1),
(3,'Greg',2),
(4,'Jack',2),
(5,'Jill',4),
(6,'Bill',3),
(7,'Sam',6)
) Category(CategoryID,Name,ParentCategoryID)
)
, Hierarchy as (
select 0 as [Level],* from Category
-- where Parent is null
union all
select super.[Level]+1, sub.CategoryID, super.Name, super.ParentCategoryID
from Category as sub
join Hierarchy as super on super.CategoryID = sub.ParentCategoryID and sub.ParentCategoryID is not null
)
select * from Hierarchy
-- where CategoryID = 6
-- order by [Level], CategoryID
For example, uncommenting the two lines at the bottom will yield this result set:
Level CategoryID Name ParentCategoryID
----------- ----------- ---- ----------------
0 6 Bill 3
1 6 Greg 2
2 6 Joan 1
3 6 Fred NULL