I'm trying to write a query to produce a list of all nodes in a tree given a root, and also the paths (using names the parents give their children) taken to get there. The recursive CTE I have working is a textbook CTE straight from the docs here, however, it's proven difficult to get the paths working in this case.
Following the git model, names are given to children by their parents as a result of paths created by traversing the tree. This implies a map to children ids like git's tree structure.
I've been looking online for a solution for a recursive query but they all seem to contain solutions that use parent ids, or materialized paths, which all would break structural sharing concepts that Rich Hickey's database as value talk is all about.
current implementation
Imagine the objects table is dead simple (for simplicity sake, let's assume integer ids):
drop table if exists objects;
create table objects (
id INT,
data jsonb
);
-- A
-- / \
-- B C
-- / \ \
-- D E F
INSERT INTO objects (id, data) VALUES
(1, '{"content": "data for f"}'), -- F
(2, '{"content": "data for e"}'), -- E
(3, '{"content": "data for d"}'), -- D
(4, '{"nodes":{"f":{"id":1}}}'), -- C
(5, '{"nodes":{"d":{"id":2}, "e":{"id":3}}}'), -- B
(6, '{"nodes":{"b":{"id":5}, "c":{"id":4}}}') -- A
;
drop table if exists work_tree;
create table work_tree (
id INT NOT NULL,
path text,
ref text,
data jsonb,
primary key (ref, id) -- TODO change to ref, path
);
create or replace function get_nested_ids_array(data jsonb) returns int[] as $$
select array_agg((value->>'id')::int) as nested_id
from jsonb_each(data->'nodes')
$$ LANGUAGE sql STABLE;
create or replace function checkout(root_id int, ref text) returns void as $$
with recursive nodes(id, nested_ids, data) AS (
select id, get_nested_ids_array(data), data
from objects
where id = root_id
union
select child.id, get_nested_ids_array(child.data), child.data
from objects child, nodes parent
where child.id = ANY(parent.nested_ids)
)
INSERT INTO work_tree (id, data, ref)
select id, data, ref from nodes
$$ language sql VOLATILE;
SELECT * FROM checkout(6, 'master');
SELECT * FROM work_tree;
If you are familiar, these objects' data property look similar to git blobs/trees, mapping names to ids or storing content. So imagine you want to create an index, so, after a "checkout", you need to query for the list of nodes, and potentially the paths to produce a working tree or index:
Current Output:
id path ref data
6 NULL master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 NULL master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 NULL master {"nodes":{"f":{"id":1}}}
1 NULL master {"content": "data for d"}
2 NULL master {"content": "data for e"}
3 NULL master {"content": "data for f"}
Desired Output:
id path ref data
6 / master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 /b master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 /c master {"nodes":{"f":{"id":1}}}
1 /b/d master {"content": "data for d"}
2 /b/e master {"content": "data for e"}
3 /c/f master {"content": "data for f"}
What's the best way to aggregate path in this case? I'm aware that I'm compressing the information when I call get_nested_ids_array when I do the recursive query, so not sure with this top-down approach how to properly aggregate with the CTE.
EDIT use case for children ids
to explain more about why I need to use children ids instead of parent:
Imagine a data structure like so:
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you only add a new root A', and children nodes C', and F', leaving the old tree in tact:
A' A
/ \ / \
C' B C
/ / \ \
F' D E F
If you make a deletion, you only add a new root A" that only points to B and you still have A if you ever need to time travel (and they share the same objects, just like git!):
A" A
\ / \
B C
/ \ \
D E F
So it seems that the best way to achieve this is with children ids so children can have multiple parents - across time and space! If you think there's another way to achieve this, by all means, let me know!
Edit #2 case for not using parent_ids
Using parent_ids has cascading effects that requires editing the entire tree. For example,
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you still need a new root A' to maintain immutability. And if we use parent_ids, then that means both B and C now have a new parent. Hence, you can see how it ripples through the entire tree immediately requiring every node is touched:
A A'
/ \ / \
B C B' C'
/ \ \ / \ \
D E F D' E' F'
EDIT #3 use case for parents giving names to children
We can make a recursive query where objects store their own name, but the question I'm asking is specifically about constructing a path where the names are given to children from their parents. This is modeling a data structure similar to the git tree, for example if you see this git graph pictured below, in the 3rd commit there is a tree (a folder) bak that points to the original root which represents a folder of all the files at the 1st commit. If that root object had it's own name, it wouldn't be possible to achieve this so simply as adding a ref. That's the beauty of git, it's as simple as making a reference to a hash and giving it a name.
This is the relationship I'm setting up which is why the jsonb data structure exists, it's to provide a mapping from a name to an id (hash in git's case). I know it's not ideal, but it does the job of providing the hash map. If there's another way to create this mapping of names to ids, and thus, a way for parents to give names to children in a top-down tree, I'm all ears!
Any help is appreciated!
Store the parent of a node instead of its children. It is a simpler and cleaner solution, in which you do not need structured data types.
This is an exemplary model with the data equivalent to that in the question:
create table objects (
id int primary key,
parent_id int,
label text,
content text);
insert into objects values
(1, 4, 'f', 'data for f'),
(2, 5, 'e', 'data for e'),
(3, 5, 'd', 'data for d'),
(4, 6, 'c', ''),
(5, 6, 'b', ''),
(6, 0, 'a', '');
And a recursive query:
with recursive nodes(id, path, content) as (
select id, label, content
from objects
where parent_id = 0
union all
select o.id, concat(path, '->', label), o.content
from objects o
join nodes n on n.id = o.parent_id
)
select *
from nodes
order by id desc;
id | path | content
----+---------+------------
6 | a |
5 | a->b |
4 | a->c |
3 | a->b->d | data for d
2 | a->b->e | data for e
1 | a->c->f | data for f
(6 rows)
The variant with children_ids.
drop table if exists objects;
create table objects (
id int primary key,
children_ids int[],
label text,
content text);
insert into objects values
(1, null, 'f', 'data for f'),
(2, null, 'e', 'data for e'),
(3, null, 'd', 'data for d'),
(4, array[1], 'c', ''),
(5, array[2,3], 'b', ''),
(6, array[4,5], 'a', '');
with recursive nodes(id, children, path, content) as (
select id, children_ids, label, content
from objects
where id = 6
union all
select o.id, o.children_ids, concat(path, '->', label), o.content
from objects o
join nodes n on o.id = any(n.children)
)
select *
from nodes
order by id desc;
id | children | path | content
----+----------+---------+------------
6 | {4,5} | a |
5 | {2,3} | a->b |
4 | {1} | a->c |
3 | | a->b->d | data for d
2 | | a->b->e | data for e
1 | | a->c->f | data for f
(6 rows)
#klin's excellent answer inspired me to experiment with PostgreSQL, trees (paths), and recursive CTE! :-D
Preamble: my motivation is storing data in PostgreSQL, but visualizing those data in a graph. While the approach here has limitations (e.g. undirected edges; ...), it may otherwise be useful in other contexts.
Here, I adapted #klins code to enable CTE without a dependence on the table id, though I do use those to deal with the issue of loops in the data, e.g.
a,b
b,a
that throw the CTE into a nonterminating loop.
To solve that, I employed the rather brilliant approach suggested by #a-horse-with-no-name in SO 31739150 -- see my comments in the script, below.
PSQL script ("tree with paths.sql"):
-- File: /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
-- Adapted from: https://stackoverflow.com/questions/44620695/recursive-path-aggregation-and-cte-query-for-top-down-tree-postgres
-- See also: /mnt/Vancouver/FC/RDB - PostgreSQL/Recursive CTE - Graph Algorithms in a Database Recursive CTEs and Topological Sort with Postgres.pdf
-- https://www.fusionbox.com/blog/detail/graph-algorithms-in-a-database-recursive-ctes-and-topological-sort-with-postgres/620/
-- Run this script in psql, at the psql# prompt:
-- \! cd /mnt/Vancouver/Programming/data/metabolism/practice/sql/
-- \i /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
\c practice
DROP TABLE tree;
CREATE TABLE tree (
-- id int primary key
id SERIAL PRIMARY KEY
,s TEXT -- s: source node
,t TEXT -- t: target node
,UNIQUE(s, t)
);
INSERT INTO tree(s, t) VALUES
('a','b')
,('b','a') -- << adding this 'back relation' breaks CTE_1 below, as it enters a loop and cannot terminate
,('b','c')
,('b','d')
,('c','e')
,('d','e')
,('e','f')
,('f','g')
,('g','h')
,('c','h');
SELECT * FROM tree;
-- SELECT s,t FROM tree WHERE s='b';
-- RECURSIVE QUERY 1 (CTE_1):
-- WITH RECURSIVE nodes(src, path, tgt) AS (
-- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'a'
-- -- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'c'
-- UNION ALL
-- SELECT t.s, concat(path, '->', t), t.t FROM tree t
-- JOIN nodes n ON n.tgt = t.s
-- )
-- -- SELECT * FROM nodes;
-- SELECT path FROM nodes;
-- RECURSIVE QUERY 2 (CTE_2):
-- Deals with "loops" in Postgres data, per
-- https://stackoverflow.com/questions/31739150/to-find-infinite-recursive-loop-in-cte
-- "With Postgres it's quite easy to prevent this by collecting all visited nodes in an array."
WITH RECURSIVE nodes(id, src, path, tgt) AS (
SELECT id, s, concat(s, '->', t), t
,array[id] as all_parent_ids
FROM tree WHERE s = 'a'
UNION ALL
SELECT t.id, t.s, concat(path, '->', t), t.t, all_parent_ids||t.id FROM tree t
JOIN nodes n ON n.tgt = t.s
AND t.id <> ALL (all_parent_ids) -- this is the trick to exclude the endless loops
)
-- SELECT * FROM nodes;
SELECT path FROM nodes;
Script execution / output (PSQL):
# \i tree_with_paths.sql
You are now connected to database "practice" as user "victoria".
DROP TABLE
CREATE TABLE
INSERT 0 10
id | s | t
----+---+---
1 | a | b
2 | b | a
3 | b | c
4 | b | d
5 | c | e
6 | d | e
7 | e | f
8 | f | g
9 | g | h
10 | c | h
path
---------------------
a->b
a->b->a
a->b->c
a->b->d
a->b->c->e
a->b->d->e
a->b->c->h
a->b->d->e->f
a->b->c->e->f
a->b->c->e->f->g
a->b->d->e->f->g
a->b->d->e->f->g->h
a->b->c->e->f->g->h
You can change the starting node (e.g. start at node "d") in the SQL script -- giving, e.g.:
# \i tree_with_paths.sql
...
path
---------------
d->e
d->e->f
d->e->f->g
d->e->f->g->h
Network visualization:
I exported those data (at the PSQL prompt) to a CSV,
# \copy (SELECT s, t FROM tree) TO '/tmp/tree.csv' WITH CSV
COPY 9
# \! cat /tmp/tree.csv
a,b
b,c
b,d
c,e
d,e
e,f
f,g
g,h
c,h
... which I visualized (image above) in a Python 3.5 venv:
>>> import networkx as nx
>>> import pylab as plt
>>> G = nx.read_edgelist("/tmp/tree.csv", delimiter=",")
>>> G.nodes()
['b', 'a', 'd', 'f', 'c', 'h', 'g', 'e']
>>> G.edges()
[('b', 'a'), ('b', 'd'), ('b', 'c'), ('d', 'e'), ('f', 'g'), ('f', 'e'), ('c', 'e'), ('c', 'h'), ('h', 'g')]
>>> G.number_of_nodes()
8
>>> G.number_of_edges()
9
>>> from networkx.drawing.nx_agraph import graphviz_layout
## There is a bug in Python or NetworkX: you may need to run this
## command 2x, as you may get an error the first time:
>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue', linewidths=0.25, font_size=10, font_weight='bold', with_labels=True)
>>> plt.show()
>>> nx.dijkstra_path(G, 'a', 'h')
['a', 'b', 'c', 'h']
>>> nx.dijkstra_path(G, 'a', 'f')
['a', 'b', 'd', 'e', 'f']
Note that the dijkstra_path returned from NetworkX is one of several possible, whereas all paths are returned by the Postgres CTE in a visually-appealing manner.
I have this very weird issue with One2many field.
First let me explain you the scenario...
I have a One2many field in sale.order.line, below code will explain the structure better
class testModule(models.Model):
_name = 'test.module'
name = fields.Char()
class testModule2(models.Model):
_name = 'test.module2'
location_id = fields.Many2one('test.module')
field1 = fields.Char()
field2 = fields.Many2one('sale.order.line')
class testModule3(models.Model):
_inherit = 'sale.order.line'
test_location = fields.One2many('test.module2', 'field2')
CASE 1:
Now what is happening is that when i create a new sales order, i select the partner_id and then add a sale.order.line and inside this line i add the One2many field test_location and then i save.
CASE 2:
Create new sales order, select partner_id then add sale.order.line and inside the sale.order.line add the test_location line [close the sales order line window]. Now after the entry before hitting save i change a field say partner_id and then click save.
CASE 3:
this case is same as case 2 but with the addition that i again change the partner_id field [changes made total 2 times first of case2 and then now], then i click on save.
RESULTS
CASE 1 works fine.
CASE 2 has a issue of
odoo.sql_db: bad query: INSERT INTO "test_module2" ("id", "field2", "field1", "location_id", "create_uid", "write_uid", "create_date", "write_date") VALUES(nextval('test_module2_id_seq'), 27, 'asd', ARRAY[1, '1'], 1, 1, (now() at time zone 'UTC'), (now() at time zone 'UTC')) RETURNING id
ProgrammingError: column "location_id" is of type integer but expression is of type integer[]
LINE 1: ...VALUES(nextval('test_module2_id_seq'), 27, 'asd', ARRAY[1, '...
now for this case i put a debugger on create/write method of sale.order.line to see waht the values are getting passed..
values = {u'product_uom': 1, u'sequence': 0, u'price_unit': 885, u'product_uom_qty': 1, u'qty_invoiced': 0, u'procurement_ids': [[5]], u'qty_delivered': 0, u'qty_to_invoice': 0, u'qty_delivered_updateable': False, u'customer_lead': 0, u'analytic_tag_ids': [[5]], u'state': u'draft', u'tax_id': [[5]], u'test_location': [[5], [0, 0, {u'field1': u'asd', u'location_id': [1, u'1']}]], 'order_id': 20, u'price_subtotal': 885, u'discount': 0, u'layout_category_id': False, u'product_id': 29, u'price_total': 885, u'invoice_status': u'no', u'name': u'[CARD] Graphics Card', u'invoice_lines': [[5]]}
in the above values location_id is getting passed like u'location_id': [1, u'1']}]] which is not correct...so for this i correct the issue in code and the update the values and pass that...
CASE 3
if the user changes the field say 2 or more than 2 times then the values are
values = {u'invoice_lines': [[5]], u'procurement_ids': [[5]], u'tax_id': [[5]], u'test_location': [[5], [1, 7, {u'field1': u'asd', u'location_id': False}]], u'analytic_tag_ids': [[5]]}
here
u'location_id': False
MULTIPLE CASE
if the user does case 1 the on the same record does case 2 or case 3 then sometimes the line will be saved as field2 = Null or False in the database other values like location_id and field1 will have data but not field2
NOTE: THIS HAPPENS WITH ANY FIELD NOT ONLY PARTNER_ID FIELD ON HEADER LEVEL OF SALE ORDER
I tried debugging myself but couldn't find the reason why this is happening .
I'm trying to traverse some nodes and I want to stop the traversal when the furthest reached node matches a certain condition.
My data is some sequentially connected nodes - there is a sample data creation script here (note this is much simplified from my real data, just to illustrate the problem):
create database plocal:people
create class Person extends V
create property Person.name string
create property Person.age float
create property Person.ident integer
insert into Person(name,age,ident) VALUES ("Bob", 30.5, 1)
insert into Person(name,age,ident) VALUES ("Bob", 30.5, 2)
insert into Person(name,age,ident) VALUES ("Carol", 20.3, 3)
insert into Person(name,age,ident) VALUES ("Carol", 19, 4)
insert into Person(name,age,ident) VALUES ("Laura", 75, 5)
insert into Person(name,age,ident) VALUES ("Laura", 60.5, 6)
insert into Person(name,age,ident) VALUES ("Laura", 46, 7)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 8)
insert into Person(name,age,ident) VALUES ("David", 86, 9)
insert into Person(name,age,ident) VALUES ("Alice", 5, 10)
insert into Person(name,age,ident) VALUES ("Nigel", 69, 11)
insert into Person(name,age,ident) VALUES ("Carol", 60, 12)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 13)
insert into Person(name,age,ident) VALUES ("Alice", 5, 14)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 15)
create class NEXT extends E
create edge NEXT from (select from Person where ident = 1) to (select from Person where ident = 3)
create edge NEXT from (select from Person where ident = 2) to (select from Person where ident = 4)
create edge NEXT from (select from Person where ident = 8) to (select from Person where ident = 12)
create edge NEXT from (select from Person where ident = 5) to (select from Person where ident = 15)
create edge NEXT from (select from Person where ident = 15) to (select from Person where ident = 14)
create edge NEXT from (select from Person where ident = 7) to (select from Person where ident = 13)
create edge NEXT from (select from Person where ident = 13) to (select from Person where ident = 10)
Let's define a sequence as a traversal of nodes from a node with no incoming links (a start node). I am trying to write a query that will return me all sequences of nodes up until the first occurrence of a specific name encountered in the traversal. Suppose the specific name is 'Mike'. So for this data I would want the following sequences to be found:
("Laura", 75, 5) -> ("Mike", 16.3, 15),
("Laura", 46, 7) -> ("Mike", 16.3, 13),
("Mike", 16.3, 8)
I can get the sequence from a specific node to the node just before a 'Mike'. The following query returns me record ("Laura", 75, 5), as the record after that one has name 'Mike'
traverse out('NEXT') from (select from Person where ident = 5) while name <> 'Mike'
The following query returns me two rows, one with record ("Laura", 75, 5) and one with record ("Mike", 16.3, 15), as the record after Mike has name 'Alice'.
traverse out('NEXT') from (select from Person where ident = 5) while name <> 'Alice'
I have two problems with this - firstly I'd like to include the node which matches the condition in the sequence (i.e. when checking for a Person called 'Mike', I'd like 'Mike' to be the final node in the sequence returned)
For that I assume I need to store the traversal in an object, and request one more out-Next for that object before returning. I've tried various approaches to storing the traversal in an object in the middle of a query, but I'm just not getting it. Here is an example (which errors):
select from (
select $seq from
(select from Person where ident = 5)
let $seq = traverse out('NEXT') from $current while name <> 'Alice'
<... here append the next node ...>
)
Secondly, this query just starts from one node - I'd like to start at all starting nodes and return a sequence ending in 'Mike' wherever there is one. I'm hoping that once I can store the traversal in an object, it should be relatively straight-forward to just run from multiple starting points rather than just one.
(Of course, another option for this specific query is to find all the nodes that match the specific condition (e.g. name= 'Mike') and work backwards from those, but I'd really like to see it work in the way I described initially as I'll need that general approach for more things later.)
I suspect quite a lot of my issue is that I'm really struggling to work out how to use the let statement in OrientDB - I'm really not understanding how the scope works, which objects exist at what stages of the query. If anyone knows any good documentation out there other than the official docs that would be really useful as I've read those and I'm still not getting it.
So any helpful hints on how to answer this question, or where to find more information on how to write this type of query would be really useful.
I hope it can help you
select expand($c) let $b=(select expand(out("NEXT")[name="Alice"]) from (select expand(last($a)) from (select from Person where ident = 5)
let $a = (traverse out('NEXT') from $current while name <> 'Alice')) limit 1), $c=unionAll($a,$b)
select name, list from (select name,$c.name as list from Person
let $b=( select expand(out("NEXT")[name="Alice"]) from (select expand(last($a)) from $parent.$current
let $a = (traverse out('NEXT') from $current while name <> 'Alice')) limit 1),
$c=unionAll($a,$b) where in("NEXT").size()=0)
where list contains "Alice"