T-SQL - Complicated recursion with sum - tsql

I apologize for the long problem description, but I was unable to break it down more than this.
Before reading on, keep in mind that my end goal is T-SQL (maybe some recursive CTE?). However, a shove in the right direction would be much appreciated (I've tried a million things and have been scratching my head for hours).
Consider the following problem: I have a table of categories which is self-referencing through ParentCategoryID->CategoryID:
-----------------------
| Category |
-----------------------
| *CategoryID |
| Name |
| ParentCategoryID |
-----------------------
This type of table allows me to build a tree structure, say:
-----------------
| parentCategory|
-----------------
/ | \
child(1) child child
/ \
child(2) child(3)
where "child" means "child category" (ignore the numbers, I'll explain them later). Obviously I can have as many children as I like at any level.
Every day, a program I've written stores values to a table "ValueRegistration", which is connected to "Category" like so:
------------------------ ---------------------- ----------------------
| ValueRegistration | | Item | | Category |
------------------------ ---------------------- ----------------------
| *RegID | | *ItemID | | *CategoryID |
| Date |>-------| CategoryID |>---------| Name |
| ItemID | | ItemTypeID | | ParentCategoryID |
| Value | ---------------------- ----------------------
------------------------ Y
|
|
---------------------
| ItemType |
---------------------
| *ItemTypeID |
| ItemType |
---------------------
As you can see, a ValueRegistration concerns a specific Item, which in turn belongs to a certain category. The category may or may not have a parent (and grandparents and great-grandparents and so on). For instance, it may be the child all the way to the bottom left (number 2) in the tree I illustrated above. Also, an Item is of a certain ItemType.
My goal:
I register values to the ValueRegistration table daily (in other words, Date and ItemID combined is also a primary key). I want to be able to retrieve a resultset on the following form:
[ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID, Value]
which seems simple enough (it's obviously just a bunch of joins). However, I also want results for rows that actually don't exist in the ValueRegistration table, namely results in which the values of sibling nodes for a given date and itemID are summed, and a new row is produced where ValueRegistration.Date and ItemType.ItemTypeID are the same as in the child nodes but where CategoryID is that of the parent of the child nodes. Keep in mind that an Item will NOT exist for this type of row in the resultset.
Consider for instance a scenario where I have ValueRegistrations for child 2 and 3 on a bunch of dates and a bunch of ItemIDs. Obviously, each registration belongs to a certain ItemType and Category. It should be clear to the reader that
ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID
is a sufficient key to identify a specific ValueRegistration (in other words, it's possible to solve my problem without having to create temporary Item rows), and so I can inner join all tables and, for instance, the following result:
ValueReg.Date, ItemType.ItemTypeID, Category.CategoryID, ValueReg.Value
08-mar-2013, 1, 5, 200
08-mar-2013, 1, 6, 250
Assume now that I have four category rows that look like this:
1, category1, NULL
2, category2, 1
5, category5, 2
6, category6, 2
I.e. category 1 is the parent of category 2, and category 2 is the parent of categories 5 and 6. Category 1 has no parent. I now wish to append the following rows to my resultset:
08-mar-2013, 1, 2, (200+250)
08-mar-2013, 1, 1, (200+250+sum(values in all other childnodes of node 1)
Remember:
the solution needs to be recursive, so that it is performed upwards in the tree (until NULL is reached)
an Item row will NOT exist for tree nodes which are calculated, so CategoryID and ItemTypeID must be used
yes, I know I could simply create "virtual" Item rows and add ValueRegistrations when I originally INSERT INTO my database, but that's solution is prone to errors, particularly if other programmers code up against my database but either forget or are unaware that results must be passed up to parent node. A solution that calculates this on request instead is much safer and, frankly, much more elegant.
I've tried to set something up along the lines of this, but I seem to get stuck with having to group by Date and ItemTypeID, and that's not allowed in a CTE. The programmer in me just wants to make a recursive function, but I'm really struggling to do that in SQL.
Anyone have an idea where to begin, what things I should try, or even (fingers crossed) a solution?
Thanks!
Alexander
EDIT:
SQL FIDDLE
CREATE TABLE ItemType(
ItemTypeID INT PRIMARY KEY,
ItemType VARCHAR(50)
);
CREATE TABLE Category(
CategoryID INT PRIMARY KEY,
Name VARCHAR(50),
ParentCategoryID INT,
FOREIGN KEY(ParentCategoryID) REFERENCES Category(CategoryID)
);
CREATE TABLE Item(
ItemID INT PRIMARY KEY,
CategoryID INT NOT NULL,
ItemTypeID INT NOT NULL,
FOREIGN KEY(CategoryID) REFERENCES Category(CategoryID),
FOREIGN KEY(ItemTypeID) REFERENCES ItemType(ItemTypeID)
);
CREATE TABLE ValueRegistration(
RegID INT PRIMARY KEY,
Date DATE NOT NULL,
Value INT NOT NULL,
ItemID INT NOT NULL,
FOREIGN KEY(ItemID) REFERENCES Item(ItemID)
);
INSERT INTO ItemType VALUES(1, 'ItemType1');
INSERT INTO ItemType VALUES(2, 'ItemType2');
INSERT INTO Category VALUES(1, 'Category1', NULL); -- Top parent (1)
INSERT INTO Category VALUES(2, 'Category2', 1); -- A child of 1
INSERT INTO Category VALUES(3, 'Category3', 1); -- A child of 1
INSERT INTO Category VALUES(4, 'Category4', 2); -- A child of 2
INSERT INTO Category VALUES(5, 'Category5', 2); -- A child of 2
INSERT INTO Category VALUES(6, 'Category6', NULL); -- Another top parent
INSERT INTO Item VALUES(1, 4, 1); -- Category 4, ItemType 1
INSERT INTO Item VALUES(2, 5, 1); -- Category 5, ItemType 1
INSERT INTO Item VALUES(3, 3, 1); -- Category 3, ItemType 1
INSERT INTO Item VALUES(4, 1, 2); -- Category 1, ItemType 2
INSERT INTO ValueRegistration VALUES(1, '2013-03-08', 100, 1);
INSERT INTO ValueRegistration VALUES(2, '2013-03-08', 200, 2);
INSERT INTO ValueRegistration VALUES(3, '2013-03-08', 300, 3);
INSERT INTO ValueRegistration VALUES(4, '2013-03-08', 400, 4);
INSERT INTO ValueRegistration VALUES(5, '2013-03-09', 120, 1);
INSERT INTO ValueRegistration VALUES(6, '2013-03-09', 220, 2);
INSERT INTO ValueRegistration VALUES(7, '2013-03-09', 320, 3);
INSERT INTO ValueRegistration VALUES(8, '2013-03-09', 420, 4);
-- -------------------- RESULTSET I WANT ----------------------
-- vr.Date | ItemType | CategoryTypeID | Value
-- ------------------------------------------------------------
-- 2013-03-08 | 'ItemType1' | 'Category4' | 100 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category5' | 200 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category3' | 300 Directly available
-- 2013-03-08 | 'ItemType1' | 'Category2' | 100+200 Calculated tree node
-- 2013-03-08 | 'ItemType1' | 'Category1' | 100+200+300 Calculated tree node
-- 2013-03-08 | 'ItemType2' | 'Category1' | 400 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category4' | 120 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category5' | 220 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category3' | 320 Directly available
-- 2013-03-09 | 'ItemType1' | 'Category2' | 120+220 Calculated tree node
-- 2013-03-09 | 'ItemType1' | 'Category1' | 120+220+320 Calculated tree node
-- 2013-03-09 | 'ItemType2' | 'Category1' | 420 Directly available

If you replace all joins to the table Category with joins to this dynamic relation, you will get the hierarchy you are lookinfg for:
with Category as (
select * from ( values
(1,'Fred',null),
(2,'Joan',1),
(3,'Greg',2),
(4,'Jack',2),
(5,'Jill',4),
(6,'Bill',3),
(7,'Sam',6)
) Category(CategoryID,Name,ParentCategoryID)
)
, Hierarchy as (
select 0 as [Level],* from Category
-- where Parent is null
union all
select super.[Level]+1, sub.CategoryID, super.Name, super.ParentCategoryID
from Category as sub
join Hierarchy as super on super.CategoryID = sub.ParentCategoryID and sub.ParentCategoryID is not null
)
select * from Hierarchy
-- where CategoryID = 6
-- order by [Level], CategoryID
For example, uncommenting the two lines at the bottom will yield this result set:
Level CategoryID Name ParentCategoryID
----------- ----------- ---- ----------------
0 6 Bill 3
1 6 Greg 2
2 6 Joan 1
3 6 Fred NULL

Related

Why is there a difference on UPDATE query results when a `UNIQUE INDEX` is involved?

I stumbled into Why would I get a duplicate key error when updating a row? so I tried a few things on https://extendsclass.com/postgresql-online.html.
Given the following schema:
create table scientist (id integer PRIMARY KEY, firstname varchar(100), lastname varchar(100));
insert into scientist (id, firstname, lastname) values (1, 'albert', 'einstein');
insert into scientist (id, firstname, lastname) values (2, 'isaac', 'newton');
insert into scientist (id, firstname, lastname) values (3, 'marie', 'curie');
select * from scientist;
CREATE UNIQUE INDEX fl_idx ON scientist(firstname, lastname);
when I run this query:
UPDATE scientist AS c SET
firstname = new_values.F,
lastname = new_values.L
FROM (
SELECT * FROM
UNNEST(
ARRAY[1, 1]::numeric[],
ARRAY['one', 'v']::text[],
ARRAY['three', 'f']::text[]
) AS T(
I,
F,
L
)
) AS new_values
WHERE c.id = new_values.I
RETURNING c.id, c.firstname, c.lastname;
I get back:
id firstname lastname
1 one three
whereas if I don't create the index (CREATE UNIQUE INDEX fl_idx ON scientist(firstname, lastname);) I get:
id firstname lastname
1 v f
So I am not sure why the UNIQUE INDEX affects the result and why there isn't a duplicate key value violates unique constraint exception when I change my UNNEST to (similar to what happens on the SO question I mentioned above) since the id is a PRIMARY KEY:
UNNEST(
ARRAY[1, 1]::numeric[],
ARRAY['one', 'one']::text[],
ARRAY['three', 'three']::text[]
)
The postgres version I run the above queries was:
PostgreSQL 11.11 (Debian 11.11-0+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
From Update:
When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
In your case you have two matches for 1, so the choice is completely dependent in which order rows are read.
Here example when index is present for both runs and results are different:
db<>fiddle demo 1
db<>fiddle demo 2
Do you know why I don't get the "duplicate key value violates unique constraint" error?
There is no duplicate key neither on column id nor pair first_name/last_name after update is performed.
Scenario 1:
+-----+------------+----------+
| id | firstname | lastname |
+-----+------------+----------+
| 2 | isaac | newton |
| 3 | marie | curie |
| 1 | v | f |
+-----+------------+----------+
Scenario 2:
+-----+------------+----------+
| id | firstname | lastname |
+-----+------------+----------+
| 2 | isaac | newton |
| 3 | marie | curie |
| 1 | one | three |
+-----+------------+----------+
EDIT:
Using "UPSERT" and trying to insert/update row twice:
INSERT INTO scientist (id,firstname, lastname)
VALUES (1, 'one', 'three'), (1, 'v', 'f')
ON CONFLICT (id)
DO UPDATE SET firstname = excluded.firstname;
-- ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time

Text array of integers for joining tables in Postgres

I have two tables
Entity
House
In house table, there is a column entity_id of type text.
And entity_id will store multiple house ids.
So it will look like
entity_id (text)
------------------
[1,2]
[3,6]
Now I have to join this entity table with house table.
How will I achieve this.
I know this may not be a good design. Though now this my responsibility to do this.
This should work:
CREATE TABLE entity(
id integer,
entity_id integer[],
details text
);
CREATE TABLE houses(
id integer,
house_name text
);
INSERT INTO entity(id, entity_id, details)
VALUES
(1, '{1,3}', 'Left side houses'),
(2, '{2,4}', 'Right side houses');
INSERT INTO houses(id, house_name)
VALUES
(1, 'Left 1'),
(2, 'Right 1'),
(3, 'Left 2'),
(4, 'Right 2');
----------------------------------------------
SELECT h.id as house_id, h.house_name, e.details
FROM houses h
LEFT JOIN entity e
ON h.id = ANY(e.entity_id);
| house_id | house_name | details |
| -------- | ---------- | ----------------- |
| 1 | Left 1 | Left side houses |
| 2 | Right 1 | Right side houses |
| 3 | Left 2 | Left side houses |
| 4 | Right 2 | Right side houses |
As Laurenz says in comment: it seems to be an error of modelisation here.
So I will not directly answer to the question but I will give a most correct structure :
CREATE TABLE entity (id int primary key);
CREATE TABLE house (id int primary key);
CREATE TABLE entity_house (id bigserial, identity int references entity(id), idhouse int references house(id));
INSERT INTO entity VALUES (1), (2);
INSERT INTO house VALUES (3), (56);
INSERT INTO entity_house (identity, idhouse) VALUES (1,56), (2,56);
SELECT e.*, h.*
FROM entity_house eh
INNER JOIN entity e ON eh.identity = e.id
INNER JOIN house h ON eh.idhouse = h.id;

Maintaining order in DB2 "IN" query

This question is based on this one. I'm looking for a solution to that question that works in DB2. Here is the original question:
I have the following table
DROP TABLE IF EXISTS `test`.`foo`;
CREATE TABLE `test`.`foo` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then I try to get records based on the primary key
SELECT * FROM foo f where f.id IN (2, 3, 1);
I then get the following result
+----+--------+
| id | name |
+----+--------+
| 1 | first |
| 2 | second |
| 3 | third |
+----+--------+
3 rows in set (0.00 sec)
As one can see, the result is ordered by id. What I'm trying to achieve is to get the results ordered in the sequence I'm providing in the query. Given this example it should return
+----+--------+
| id | name |
+----+--------+
| 2 | second |
| 3 | third |
| 1 | first |
+----+--------+
3 rows in set (0.00 sec)
You could use a derived table with the IDs you want, and the order you want, and then join the table in, something like...
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN (
SELECT 1, '9486154876' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 2, '9403149581' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 3, '9465828230' FROM SYSIBM.SYSDUMMY1
) B (ORD, ID)
ON serv.serv_deliv_id = B.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY B.ORD
You can use derived column to do custom ordering.
select
case
when serv.SERV_DELIV_ID = '9486154876' then 1 ELSE
when serv.SERV_DELIV_ID = '9403149581' then 2 ELSE 3
END END as custom_order,
...
...
ORDER BY custom_order
To make the logic a little bit more evident you might modify the solution provided by bhamby like so:
WITH ordered_in_list (ord, id) as (
VALUES (1, '9486154876'), (2, '9403149581'), (3, '9465828230')
)
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN ordered_in_list il
ON serv.serv_deliv_id = il.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY il.ORD

How to view linked data in OrientDB

I have two tables, described as below.
Table 1: countries
c_id, int
c_name, varchar(20) (PK)
Sample records in this table are.
c_id | c_name
1 | USA
2 | UK
3 | PAK
Table 2: immigrants
i_id, int
i_name, varchar(20)
i_country, int (FK)
Sample records in this table are.
i_id | i_name | i_country
1 | John | 1
2 | Graham | 2
3 | Ali | 3
Question 1:
I want to create two classes (tables) in OrientDB but I need to know what should be the data type of the FK field and what to insert in it. I mean what should I write in query to insert the id of the PK table. Does it need to be #rid? How?
QUESTION 2:
What is the OrientDB SQL for producing the following output.
i_id | i_name | i_country | c_id | c_name
1 | John | 1 | 1 | USA
2 | Graham | 2 | UK
3 | Ali 3 | PAK
with OrientDB you can avoid the creation of FK fields and join operations by using direct Edge links between records. For example:
create class Country extends V;
create class Immigrant extends V;
create class comesFrom extends E;
create property Country.c_id integer;
create property Country.c_name String;
create property Immigrant.i_id integer;
create property Immigrant.i_name String;
insert into Country(c_id, c_name) values (1, USA);
insert into Country(c_id, c_name) values (2, UK);
insert into Country(c_id, c_name) values (3, PAK);
insert into Immigrant(i_id, i_name) values (1, John);
insert into Immigrant(i_id, i_name) values (2, Graham);
insert into Immigrant(i_id, i_name) values (3, Ali);
Now you can connect directly the record you want (I used an Edge called 'comesFrom' and subqueries to link the id fields but you could also directly use the #RID field)
create edge comesFrom from (select from Immigrant where i_id = 1) to (select from Country where c_id = 1);
create edge comesFrom from (select from Immigrant where i_id = 2) to (select from Country where c_id = 2);
create edge comesFrom from (select from Immigrant where i_id = 3) to (select from Country where c_id = 3);
Finally you can query the fields you want without join operations:
select i_id, i_name, out('comesFrom').c_id as c_id, out('comesFrom').c_name as c_name
from Immigrant unwind c_id, c_name
----+------+----+------+----+------
# |#CLASS|i_id|i_name|c_id|c_name
----+------+----+------+----+------
0 |null |1 |John |1 |USA
1 |null |2 |Graham|2 |UK
2 |null |3 |Ali |3 |PAK
----+------+----+------+----+------
In Orient Studio you should obtain a graph like this:
OrientDB does not support join.
You should declare the i_country field in class Immigrant as Link to Country class, or you can use Graph Model if you want bidirectional relationship.
If you want to use Direct link
you can insert immigrants like this
insert into Immigrant set id=1, name ='John', country = (select from country where id = 1 )
then
select id,name,country.id,country.name form Immigrant
you could speed up the insertion and link operations with a javascript function like this:
so the edge is automatically created when you decide to create a new person. You can also decide where to place the person:
Could it be helpful for you ?
I think that you have created a Document DB because I reproduced the issue by creating a Document DB and not a Graph DB and I get, like you, the same exception. This is because if you want to work with Vertices and Edges you must use the Graph DB type.

Getting Breadcrumbs in Postgres

I have a table that represents a hierarchy by referencing itself.
create table nodes (
id integer primary key,
parent_id integer references nodes (id),
name varchar(255)
);
Given a specific node, I would like to find all of its parents in order, as breadcrumbs. For example, given this data:
insert into nodes (id,parent_id,name) values
(1,null,'Root'),
(2,1,'Left'),
(3,1,'Right'),
(4,2,'LeftLeft'),
(5,2,'LeftRight'),
(6,5,'LeftRightLeft');
If I wanted to start at id=5 I would expect the result to be:
id | depth | name
-- | ----- | ----
1 | 0 | 'Root'
2 | 1 | 'Left'
5 | 2 | 'LeftRight'
I don't care if the depth column is present, but I included it for clarity, to show that there should only be one result for each depth and that results should be in order of depth. I don't care if it's ascending or descending. The purpose of this is to be able to print out some breadcrumbs that look like this:
(1)Root \ (2)Left \ (5)LeftRight
The basic recursive query would look like this:
with recursive tree(id, name, parent_id) as (
select n.id, n.name, n.parent_id
from nodes n
where n.id = 5
union all
select n.id, n.name, n.parent_id
from nodes n
join tree t on (n.id = t.parent_id)
)
select *
from tree;
Demo: http://sqlfiddle.com/#!15/713f8/1
That will give you everything need to rebuild the path from id = 5 back to the root.