recursive path aggregation and CTE query for top-down tree postgres - postgresql

I'm trying to write a query to produce a list of all nodes in a tree given a root, and also the paths (using names the parents give their children) taken to get there. The recursive CTE I have working is a textbook CTE straight from the docs here, however, it's proven difficult to get the paths working in this case.
Following the git model, names are given to children by their parents as a result of paths created by traversing the tree. This implies a map to children ids like git's tree structure.
I've been looking online for a solution for a recursive query but they all seem to contain solutions that use parent ids, or materialized paths, which all would break structural sharing concepts that Rich Hickey's database as value talk is all about.
current implementation
Imagine the objects table is dead simple (for simplicity sake, let's assume integer ids):
drop table if exists objects;
create table objects (
id INT,
data jsonb
);
-- A
-- / \
-- B C
-- / \ \
-- D E F
INSERT INTO objects (id, data) VALUES
(1, '{"content": "data for f"}'), -- F
(2, '{"content": "data for e"}'), -- E
(3, '{"content": "data for d"}'), -- D
(4, '{"nodes":{"f":{"id":1}}}'), -- C
(5, '{"nodes":{"d":{"id":2}, "e":{"id":3}}}'), -- B
(6, '{"nodes":{"b":{"id":5}, "c":{"id":4}}}') -- A
;
drop table if exists work_tree;
create table work_tree (
id INT NOT NULL,
path text,
ref text,
data jsonb,
primary key (ref, id) -- TODO change to ref, path
);
create or replace function get_nested_ids_array(data jsonb) returns int[] as $$
select array_agg((value->>'id')::int) as nested_id
from jsonb_each(data->'nodes')
$$ LANGUAGE sql STABLE;
create or replace function checkout(root_id int, ref text) returns void as $$
with recursive nodes(id, nested_ids, data) AS (
select id, get_nested_ids_array(data), data
from objects
where id = root_id
union
select child.id, get_nested_ids_array(child.data), child.data
from objects child, nodes parent
where child.id = ANY(parent.nested_ids)
)
INSERT INTO work_tree (id, data, ref)
select id, data, ref from nodes
$$ language sql VOLATILE;
SELECT * FROM checkout(6, 'master');
SELECT * FROM work_tree;
If you are familiar, these objects' data property look similar to git blobs/trees, mapping names to ids or storing content. So imagine you want to create an index, so, after a "checkout", you need to query for the list of nodes, and potentially the paths to produce a working tree or index:
Current Output:
id path ref data
6 NULL master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 NULL master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 NULL master {"nodes":{"f":{"id":1}}}
1 NULL master {"content": "data for d"}
2 NULL master {"content": "data for e"}
3 NULL master {"content": "data for f"}
Desired Output:
id path ref data
6 / master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 /b master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 /c master {"nodes":{"f":{"id":1}}}
1 /b/d master {"content": "data for d"}
2 /b/e master {"content": "data for e"}
3 /c/f master {"content": "data for f"}
What's the best way to aggregate path in this case? I'm aware that I'm compressing the information when I call get_nested_ids_array when I do the recursive query, so not sure with this top-down approach how to properly aggregate with the CTE.
EDIT use case for children ids
to explain more about why I need to use children ids instead of parent:
Imagine a data structure like so:
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you only add a new root A', and children nodes C', and F', leaving the old tree in tact:
A' A
/ \ / \
C' B C
/ / \ \
F' D E F
If you make a deletion, you only add a new root A" that only points to B and you still have A if you ever need to time travel (and they share the same objects, just like git!):
A" A
\ / \
B C
/ \ \
D E F
So it seems that the best way to achieve this is with children ids so children can have multiple parents - across time and space! If you think there's another way to achieve this, by all means, let me know!
Edit #2 case for not using parent_ids
Using parent_ids has cascading effects that requires editing the entire tree. For example,
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you still need a new root A' to maintain immutability. And if we use parent_ids, then that means both B and C now have a new parent. Hence, you can see how it ripples through the entire tree immediately requiring every node is touched:
A A'
/ \ / \
B C B' C'
/ \ \ / \ \
D E F D' E' F'
EDIT #3 use case for parents giving names to children
We can make a recursive query where objects store their own name, but the question I'm asking is specifically about constructing a path where the names are given to children from their parents. This is modeling a data structure similar to the git tree, for example if you see this git graph pictured below, in the 3rd commit there is a tree (a folder) bak that points to the original root which represents a folder of all the files at the 1st commit. If that root object had it's own name, it wouldn't be possible to achieve this so simply as adding a ref. That's the beauty of git, it's as simple as making a reference to a hash and giving it a name.
This is the relationship I'm setting up which is why the jsonb data structure exists, it's to provide a mapping from a name to an id (hash in git's case). I know it's not ideal, but it does the job of providing the hash map. If there's another way to create this mapping of names to ids, and thus, a way for parents to give names to children in a top-down tree, I'm all ears!
Any help is appreciated!

Store the parent of a node instead of its children. It is a simpler and cleaner solution, in which you do not need structured data types.
This is an exemplary model with the data equivalent to that in the question:
create table objects (
id int primary key,
parent_id int,
label text,
content text);
insert into objects values
(1, 4, 'f', 'data for f'),
(2, 5, 'e', 'data for e'),
(3, 5, 'd', 'data for d'),
(4, 6, 'c', ''),
(5, 6, 'b', ''),
(6, 0, 'a', '');
And a recursive query:
with recursive nodes(id, path, content) as (
select id, label, content
from objects
where parent_id = 0
union all
select o.id, concat(path, '->', label), o.content
from objects o
join nodes n on n.id = o.parent_id
)
select *
from nodes
order by id desc;
id | path | content
----+---------+------------
6 | a |
5 | a->b |
4 | a->c |
3 | a->b->d | data for d
2 | a->b->e | data for e
1 | a->c->f | data for f
(6 rows)
The variant with children_ids.
drop table if exists objects;
create table objects (
id int primary key,
children_ids int[],
label text,
content text);
insert into objects values
(1, null, 'f', 'data for f'),
(2, null, 'e', 'data for e'),
(3, null, 'd', 'data for d'),
(4, array[1], 'c', ''),
(5, array[2,3], 'b', ''),
(6, array[4,5], 'a', '');
with recursive nodes(id, children, path, content) as (
select id, children_ids, label, content
from objects
where id = 6
union all
select o.id, o.children_ids, concat(path, '->', label), o.content
from objects o
join nodes n on o.id = any(n.children)
)
select *
from nodes
order by id desc;
id | children | path | content
----+----------+---------+------------
6 | {4,5} | a |
5 | {2,3} | a->b |
4 | {1} | a->c |
3 | | a->b->d | data for d
2 | | a->b->e | data for e
1 | | a->c->f | data for f
(6 rows)

#klin's excellent answer inspired me to experiment with PostgreSQL, trees (paths), and recursive CTE! :-D
Preamble: my motivation is storing data in PostgreSQL, but visualizing those data in a graph. While the approach here has limitations (e.g. undirected edges; ...), it may otherwise be useful in other contexts.
Here, I adapted #klins code to enable CTE without a dependence on the table id, though I do use those to deal with the issue of loops in the data, e.g.
a,b
b,a
that throw the CTE into a nonterminating loop.
To solve that, I employed the rather brilliant approach suggested by #a-horse-with-no-name in SO 31739150 -- see my comments in the script, below.
PSQL script ("tree with paths.sql"):
-- File: /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
-- Adapted from: https://stackoverflow.com/questions/44620695/recursive-path-aggregation-and-cte-query-for-top-down-tree-postgres
-- See also: /mnt/Vancouver/FC/RDB - PostgreSQL/Recursive CTE - Graph Algorithms in a Database Recursive CTEs and Topological Sort with Postgres.pdf
-- https://www.fusionbox.com/blog/detail/graph-algorithms-in-a-database-recursive-ctes-and-topological-sort-with-postgres/620/
-- Run this script in psql, at the psql# prompt:
-- \! cd /mnt/Vancouver/Programming/data/metabolism/practice/sql/
-- \i /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
\c practice
DROP TABLE tree;
CREATE TABLE tree (
-- id int primary key
id SERIAL PRIMARY KEY
,s TEXT -- s: source node
,t TEXT -- t: target node
,UNIQUE(s, t)
);
INSERT INTO tree(s, t) VALUES
('a','b')
,('b','a') -- << adding this 'back relation' breaks CTE_1 below, as it enters a loop and cannot terminate
,('b','c')
,('b','d')
,('c','e')
,('d','e')
,('e','f')
,('f','g')
,('g','h')
,('c','h');
SELECT * FROM tree;
-- SELECT s,t FROM tree WHERE s='b';
-- RECURSIVE QUERY 1 (CTE_1):
-- WITH RECURSIVE nodes(src, path, tgt) AS (
-- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'a'
-- -- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'c'
-- UNION ALL
-- SELECT t.s, concat(path, '->', t), t.t FROM tree t
-- JOIN nodes n ON n.tgt = t.s
-- )
-- -- SELECT * FROM nodes;
-- SELECT path FROM nodes;
-- RECURSIVE QUERY 2 (CTE_2):
-- Deals with "loops" in Postgres data, per
-- https://stackoverflow.com/questions/31739150/to-find-infinite-recursive-loop-in-cte
-- "With Postgres it's quite easy to prevent this by collecting all visited nodes in an array."
WITH RECURSIVE nodes(id, src, path, tgt) AS (
SELECT id, s, concat(s, '->', t), t
,array[id] as all_parent_ids
FROM tree WHERE s = 'a'
UNION ALL
SELECT t.id, t.s, concat(path, '->', t), t.t, all_parent_ids||t.id FROM tree t
JOIN nodes n ON n.tgt = t.s
AND t.id <> ALL (all_parent_ids) -- this is the trick to exclude the endless loops
)
-- SELECT * FROM nodes;
SELECT path FROM nodes;
Script execution / output (PSQL):
# \i tree_with_paths.sql
You are now connected to database "practice" as user "victoria".
DROP TABLE
CREATE TABLE
INSERT 0 10
id | s | t
----+---+---
1 | a | b
2 | b | a
3 | b | c
4 | b | d
5 | c | e
6 | d | e
7 | e | f
8 | f | g
9 | g | h
10 | c | h
path
---------------------
a->b
a->b->a
a->b->c
a->b->d
a->b->c->e
a->b->d->e
a->b->c->h
a->b->d->e->f
a->b->c->e->f
a->b->c->e->f->g
a->b->d->e->f->g
a->b->d->e->f->g->h
a->b->c->e->f->g->h
You can change the starting node (e.g. start at node "d") in the SQL script -- giving, e.g.:
# \i tree_with_paths.sql
...
path
---------------
d->e
d->e->f
d->e->f->g
d->e->f->g->h
Network visualization:
I exported those data (at the PSQL prompt) to a CSV,
# \copy (SELECT s, t FROM tree) TO '/tmp/tree.csv' WITH CSV
COPY 9
# \! cat /tmp/tree.csv
a,b
b,c
b,d
c,e
d,e
e,f
f,g
g,h
c,h
... which I visualized (image above) in a Python 3.5 venv:
>>> import networkx as nx
>>> import pylab as plt
>>> G = nx.read_edgelist("/tmp/tree.csv", delimiter=",")
>>> G.nodes()
['b', 'a', 'd', 'f', 'c', 'h', 'g', 'e']
>>> G.edges()
[('b', 'a'), ('b', 'd'), ('b', 'c'), ('d', 'e'), ('f', 'g'), ('f', 'e'), ('c', 'e'), ('c', 'h'), ('h', 'g')]
>>> G.number_of_nodes()
8
>>> G.number_of_edges()
9
>>> from networkx.drawing.nx_agraph import graphviz_layout
## There is a bug in Python or NetworkX: you may need to run this
## command 2x, as you may get an error the first time:
>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue', linewidths=0.25, font_size=10, font_weight='bold', with_labels=True)
>>> plt.show()
>>> nx.dijkstra_path(G, 'a', 'h')
['a', 'b', 'c', 'h']
>>> nx.dijkstra_path(G, 'a', 'f')
['a', 'b', 'd', 'e', 'f']
Note that the dijkstra_path returned from NetworkX is one of several possible, whereas all paths are returned by the Postgres CTE in a visually-appealing manner.

Related

postgresql the difference between with or without inverse transition function

dbfilddle
source
Quote from 38.12.1. Moving-Aggregate Mode
The forward transition function for moving-aggregate mode is not
allowed to return null as the new state value. If the inverse
transition function returns null, this is taken as an indication that
the inverse function cannot reverse the state calculation for this
particular input, and so the aggregate calculation will be redone from
scratch for the current frame starting position.
-- create aggregates that record the series of transform calls (these are
-- intentionally not true inverses)
create function logging_sfunc_strict(text, anyelement)
returns text as
$$
select $1 || '*' || quote_nullable($2)
$$
LANGUAGE sql strict IMMUTABLE;
create or replace function logging_msfunc_strict(text,anyelement)
returns text as
$$
select $1 || '+' || quote_nullable($2)
$$
LANGUAGE sql strict IMMUTABLE;
create or replace function logging_minvfunc_strict(text, anyelement)
returns text as
$$
select $1 || '-' || quote_nullable($2)
$$
LANGUAGE sql strict IMMUTABLE;
create aggregate logging_agg_strict(text)
(
stype = text,
sfunc = logging_sfunc_strict,
mstype = text,
msfunc = logging_msfunc_strict,
minvfunc = logging_minvfunc_strict
);
create aggregate logging_agg_strict_initcond(anyelement)
(
stype = text,
sfunc = logging_sfunc_strict,
mstype = text,
msfunc = logging_msfunc_strict,
minvfunc = logging_minvfunc_strict,
initcond = 'I',
minitcond = 'MI'
);
execute following query:
SELECT
p::text || ',' || i::text || ':' || COALESCE(v::text, 'NULL') AS _row,
logging_agg_strict (v) OVER w AS nstrict,
logging_agg_strict_initcond (v) OVER w AS nstrict
FROM (
VALUES (1, 1, NULL),
(1, 2, 'a'),
(1, 3, 'b'),
(1, 4, NULL),
(1, 5, NULL),
(1, 6, 'c'),
(2, 1, NULL),
(2, 2, 'x'),
(3, 1, 'z')) AS t (p, i, v)
WINDOW w AS (PARTITION BY p ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW);
returns:
_row | nstrict | nstrict
----------+-----------+----------------
1,1:NULL | [[null]] | MI
1,2:a | a | MI+'a'
1,3:b | a+'b' | MI+'a'+'b'
1,4:NULL | a+'b'-'a' | MI+'a'+'b'-'a'
1,5:NULL | [[null]] | MI
1,6:c | c | MI+'c'
2,1:NULL | [[null]] | MI
2,2:x | x | MI+'x'
3,1:z | z | MI+'z'
(9 rows)
For now I don't understand row 1,4:NULL | a+'b'-'a' | MI+'a'+'b'-'a'.
I am not sure why you 1st time encounter NULL then it will call inverse transition function Overall, not sure the idea of inverse transition function.
Quote from CREATE AGGREGATE:
minvfunc
The name of the inverse state transition function to be used in moving-aggregate mode. This function has the same argument and result
types as msfunc, but it is used to remove a value from the current
aggregate state, rather than add a value to it. The inverse transition
function must have the same strictness attribute as the forward state
transition function.
search emaillist keyword: minvfunc. There is no hit.
update
now the question is different. I am trying to understand the following quoted part(manual chapter: 38.12.1. Moving-Aggregate Mode). The computation difference between with and without inverse transition function.
Without an inverse transition function, the window function mechanism
must recalculate the aggregate from scratch each time the frame
starting point moves, resulting in run time proportional to the number
of input rows times the average frame length. With an inverse
transition function, the run time is only proportional to the number
of input rows.
Let say the window frame is
WINDOW w AS (PARTITION BY p ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
I assume the following is how with inverse transition function how does it go computation.
ordered_value sum_aggregate
a a
b a+b
c a+b+c-a
d a+b+c+d-a-b
e a+b+c+d+e-a-b-c
f a+b+c+d+e+f-a-b-c-d
So the question is does above explanation is with inverse transition function compute or not. If it is then without it, how does it go computed.

Extracting values from non-standard markup strings in PostgreSQL

Unfortunately, I have a table like the following:
DROP TABLE IF EXISTS my_list;
CREATE TABLE my_list (index int PRIMARY KEY, mystring text, status text);
INSERT INTO my_list
(index, mystring, status) VALUES
(12, '', 'D'),
(14, '[id] 5', 'A'),
(15, '[id] 12[num] 03952145815', 'C'),
(16, '[id] 314[num] 03952145815[name] Sweet', 'E'),
(19, '[id] 01211[num] 03952145815[name] Home[oth] Alabama', 'B');
Is there any trick to get out number of [id] as integer from the mystring text shown above? As though I ran the following query:
SELECT index, extract_id_function(mystring), status FROM my_list;
and got results like:
12 0 D
14 5 A
15 12 C
16 314 E
19 1211 B
Preferably with only simple string functions and if not regular expression will be fine.
If I understand correctly, you have a rather unconventional markup format where [id] is followed by a space, then a series of digits that represents a numeric identifier. There is no closing tag, the next non-numeric field ends the ID.
If so, you're going to be able to do this with non-regexp string ops, but only quite badly. What you'd really need is the SQL equivalent of strtol, which consumes input up to the first non-digit and just returns that. A cast to integer will not do that, it'll report an error if it sees non-numeric garbage after the number. (As it happens I just wrote a C extension that exposes strtol for decoding hex values, but I'm guessing you don't want to use C extensions if you don't even want regex...)
It can be done with string ops if you make the simplifying assumption that an [id] nnnn tag always ends with either end of string or another tag, so it's always [ at the end of the number. We also assume that you're only interested in the first [id] if multiple appear in a string. That way you can write something like the following horrible monstrosity:
select
"index",
case
when next_tag_idx > 0 then substring(cut_id from 0 for next_tag_idx)
else cut_id
end AS "my_id",
"status"
from (
select
position('[' in cut_id) AS next_tag_idx,
*
from (
select
case
when id_offset = 0 then null
else substring(mystring from id_offset + 4)
end AS cut_id,
*
from (
select
position('[id] ' in mystring) AS id_offset,
*
from my_list
) x
) y
) z;
(If anybody ever actually uses that query for anything, kittens will fall from the sky and splat upon the pavement, wailing in horror all the way down).
Or you can be sensible and just use a regular expression for this kind of string processing, in which case your query (assuming you only want the first [id]) is:
regress=> SELECT
"index",
coalesce((SELECT (regexp_matches(mystring, '\[id\]\s?(\d+)'))[1])::integer, 0) AS my_id,
status
FROM my_list;
index | my_id | status
-------+----------------+--------
12 | 0 | D
14 | 5 | A
15 | 12 | C
16 | 314 | E
19 | 01211 | B
(5 rows)
Update: If you're having issues with unicode handling in regex, upgrade to Pg 9.2. See https://stackoverflow.com/a/14293924/398670

Split a string and populate a table for all records in table in SQL Server 2008 R2

I have a table EmployeeMoves:
| EmployeeID | CityIDs
+------------------------------
| 24 | 23,21,22
| 25 | 25,12,14
| 29 | 1,2,5
| 31 | 7
| 55 | 11,34
| 60 | 7,9,21,23,30
I'm trying to figure out how to expand the comma-delimited values from the EmployeeMoves.CityIDs column to populate an EmployeeCities table, which should look like this:
| EmployeeID | CityID
+------------------------------
| 24 | 23
| 24 | 21
| 24 | 22
| 25 | 25
| 25 | 12
| 25 | 14
| ... and so on
I already have a function called SplitADelimitedList that splits a comma-delimited list of integers into a rowset. It takes the delimited list as a parameter. The SQL below will give me a table with split values under the column Value:
select value from dbo.SplitADelimitedList ('23,21,1,4');
| Value
+-----------
| 23
| 21
| 1
| 4
The question is: How do I populate EmployeeCities from EmployeeMoves with a single (even if complex) SQL statement using the comma-delimited list of CityIDs from each row in the EmployeeMoves table, but without any cursors or looping in T-SQL? I could have 100 records in the EmployeeMoves table for 100 different employees.
This is how I tried to solve this problem. It seems to work and is very quick in performance.
INSERT INTO EmployeeCities
SELECT
em.EmployeeID,
c.Value
FROM EmployeeMoves em
CROSS APPLY dbo.SplitADelimitedList(em.CityIDs) c;
UPDATE 1:
This update provides the definition of the user-defined function dbo.SplitADelimitedList. This function is used in above query to split a comma-delimited list to table of integer values.
CREATE FUNCTION dbo.fn_SplitADelimitedList1
(
#String NVARCHAR(MAX)
)
RETURNS #SplittedValues TABLE(
Value INT
)
AS
BEGIN
DECLARE #SplitLength INT
DECLARE #Delimiter VARCHAR(10)
SET #Delimiter = ',' --set this to the delimiter you are using
WHILE len(#String) > 0
BEGIN
SELECT #SplitLength = (CASE charindex(#Delimiter, #String)
WHEN 0 THEN
datalength(#String) / 2
ELSE
charindex(#Delimiter, #String) - 1
END)
INSERT INTO #SplittedValues
SELECT cast(substring(#String, 1, #SplitLength) AS INTEGER)
WHERE
ltrim(rtrim(isnull(substring(#String, 1, #SplitLength), ''))) <> '';
SELECT #String = (CASE ((datalength(#String) / 2) - #SplitLength)
WHEN 0 THEN
''
ELSE
right(#String, (datalength(#String) / 2) - #SplitLength - 1)
END)
END
RETURN
END
Preface
This is not the right way to do it. You shouldn't create comma-delimited lists in SQL Server. This violates first normal form, which should sound like an unbelievably vile expletive to you.
It is trivial for a client-side application to select rows of employees and related cities and display this as a comma-separated list. It shouldn't be done in the database. Please do everything you can to avoid this kind of construction in the future. If at all possible, you should refactor your database.
The Right Answer
To get the list of cities, properly expanded, from a table containing lists of cities, you can do this:
INSERT dbo.EmployeeCities
SELECT
M.EmployeeID,
C.CityID
FROM
EmployeeMoves M
CROSS APPLY dbo.SplitADelimitedList(M.CityIDs) C
;
The Wrong Answer
I wrote this answer due to a misunderstanding of what you wanted: I thought you were trying to query against properly-stored data to produce a list of comma-separated CityIDs. But I realize now you wanted the reverse: to query the list of cities using existing comma-separated values already stored in a column.
WITH EmployeeData AS (
SELECT
M.EmployeeID,
M.CityID
FROM
dbo.SplitADelimitedList ('23,21,1,4') C
INNER JOIN dbo.EmployeeMoves M
ON Convert(int, C.Value) = M.CityID
)
SELECT
E.EmployeeID,
CityIDs = Substring((
SELECT ',' + Convert(varchar(max), CityID)
FROM EmployeeData C
WHERE E.EmployeeID = C.EmployeeID
FOR XML PATH (''), TYPE
).value('.[1]', 'varchar(max)'), 2, 2147483647)
FROM
(SELECT DISTINCT EmployeeID FROM EmployeeData) E
;
Part of my difficulty in understanding is that your question is a bit disorganized. Next time, please clearly label your example data and show what you have, and what you're trying to work toward. Since you put the data for EmployeeCities last, it looked like it was what you were trying to achieve. It's not a good use of people's time when questions are not laid out well.

Inserting many rows in treelike structure in SQL SERVER 2005

I have a few tables in SQL that are pretty much like this
A B C
ID_A ID_B ID_C
Name Name Name
ID_A ID_B
As you can see, A is linked to B and B to C. Those are basically tables that contains data models. Now, I would need to be able to create date based on those tables. For example, if I have the following datas
A B C
1 Name1 1 SubName1 1 1 SubSubName1 1
2 Name2 2 SubName2 1 2 SubSubName2 1
3 SubName3 2 3 SubSubName3 2
4 SubSubName4 3
5 SubSubName5 3
I would like to copy the 'content' of those tables in others tables. Of course, the auto numeric key that is generated when inserting into the new tables are diffirent that those one and I would like to be able to keep track so that I can copy the entire thing. The structure of the recipient table contains more information that those, but it's mainly dates and other stuff that are easy to get for me.
I would need to this entirely in TRANSACT-SQL (with built-in function if needed). Is this possible and can anyone give me a short example. I manage to do it for one level, but I get confused for the rest.
thanks
EDIT : The info above is just an example, because my actual diagram looks more like this
Model tables :
Processes -- (1-N) Steps -- (1-N) Task -- (0-N) TaskCheckList
-- (0-N) StepsCheckLists
Where as the table I need to fill looks like this
Client -- (0-N) Sequence -- (1-N) ClientProcesses -- (1-N) ClientSteps -- (1-N)ClientTasks -- (0-N) ClientTaskCheckList
-- (0-N)ClientStepCheckLists
The Client already exists and when I need to run the script, I create one sequence, which will contains all processes, which will contains its steps, taks, etc...
Ok,
So I did a lot of trials and error, and here is what I got. It seems to work fine although it sound quite big for something that seemed easy at first.
The whole this is somehow in french and english because our client is french and so are we anyway. It does insert every date in all tables that I needed. The only thing left to this will be the first lines where I need to select the date to insert according to some parameters but this is the easy part.
DECLARE #IdProcessusRenouvellement bigint
DECLARE #NomProcessus nvarchar(255)
SELECT #IdProcessusRenouvellement = ID FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
SELECT #NomProcessus = Nom FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
DECLARE #InsertedSequence table(ID bigint)
DECLARE #Contrats table(ID bigint,IdClient bigint,NumeroContrat nvarchar(255))
INSERT INTO #Contrats SELECT ID,IdClient,NumeroContrat FROM T_ClientContrat
DECLARE #InsertedIdsSeq as Table(ID bigint)
-- Séquences de travail
INSERT INTO T_ClientContratSequenceTravail(IdClientContrat,Nom,DateDebut)
OUTPUT Inserted.ID INTO #InsertedIdsSeq
SELECT ID, #NomProcessus + ' - ' + CONVERT(VARCHAR(10), GETDATE(), 120) + ' : ' + NumeroContrat ,GETDATE()
FROM #Contrats
-- Processus
DECLARE #InsertedIdsPro as Table(ID bigint,IdProcessus bigint)
INSERT INTO T_ClientContratProcessus
(IdClientContratSequenceTravail,IdProcessus,Nom,DateDebut,DelaiRappel,DateRappel,LienAvecPro cessusRenouvellement,IdStatutProcessus,IdResponsable,Sequence)
OUTPUT Inserted.ID,Inserted.IdProcessus INTO #InsertedIdsPro
SELECT I.ID,P.ID,P.Nom,GETDATE(),P.DelaiRappel,GETDATE(),P.LienAvecProcessusRenouvellement,0,0,0
FROM #InsertedIdsSeq I, T_Ref_Processus P
WHERE P.ID = #IdProcessusRenouvellement
-- Étapes
DECLARE #InsertedIdsEt as table(ID bigint,IdProcessusEtape bigint)
INSERT INTO T_ClientContratProcessusEtape
(IdClientContratProcessus,IdProcessusEtape,Nom,DateDebut,DelaiRappel,DateRappel,NomListeVeri fication,Sequence,IdStatutEtape,IdResponsable,IdTypeResponsable,ListeVerificationTermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtape INTO #InsertedIdsEt
SELECT I.ID,E.ID,
E.Nom,GETDATE(),E.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),E.Sequence,0,0,E.IdTypeResponsabl e,0
FROM #InsertedIdsPro I INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessus = E.IdProcessus
LEFT JOIN T_Ref_ListeVerification L ON E.IdListeVerification = L.ID
-- Étapes : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeListeVerificationItem
(IdClientContratProcessusEtape,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessusEtape = E.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON E.IdListeVerification = IT.IdListeVerification
-- Tâches
DECLARE #InsertedIdsTa as table(ID bigint, IdProcessusEtapeTache bigint)
INSERT INTO T_ClientContratProcessusEtapeTache
(IdClientContratProcessusEtape,IdProcessusEtapeTache,Nom,DateDebut,DelaiRappel,DateRappel,No mListeVerification,Sequence,IdStatutTache,IdResponsable,IdTypeResponsable,ListeVerificationT ermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtapeTache INTO #InsertedIdsTa
SELECT I.ID,T.ID,
T.Nom,GETDATE(),T.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),T.Sequence,0,0,T.IdTypeResponsabl e,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtape = T.IdProcessusEtape
LEFT JOIN T_Ref_ListeVerification L ON T.IdListeVerification = L.ID
-- Tâches : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeTacheListeVerificationItem
(IdClientContratProcessusEtapeTache,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsTa I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtapeTache = T.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON T.IdListeVerification = IT.IdListeVerification

Howto design Tables for Navigating Hierarchical Regions with Diamond Structures

Our solution needs us to work in hierarchies of regions which are as follows.
STATE
|
DISTRICT
|
TALUK
/ \
/ \
HOBLI PANCHAYAT
\ /
\ /
\ /
VILLAGE
There are 2 ways to navigate to a village from a Taluk. Either through HOBLI OR through PANCHAYAT.
We need a PK(non-business KEY) and a SERIAL_NUMBER/ID for each STATE, DISTRICT, TALUK, HOBLI, PANCHAYAT, VILLAGE; However, each village has 8 additional attributes.
How do I design this structure in PostgreSQL 8.4 ?
My previous experience was on Oracle so I'm wondering how to navigate hierarchical structures in PostgreSQL 8.4 ? If at all, the solution should be friendly for READ/navigation speed.
================================================================
Quassnoi : Here is a sample hierarchy
KARNATAKA
|
|
TUMKUR (District)
|
|
|
KUNIGAL (Taluk)
/ \
/ \
/ \
HULIYUR DURGA(Hobli) CHOWDANAKUPPE(Panchayat)
\ /
\ /
\ /
\ /
\ /
Voddarakempapura(Village)
Ankanahalli(Village)
Chowdanakuppe(Village)
Yedehalli(Village)
NAVIGATE : For now, I will be presenting 2 separate UI screens each having separate navigable hierarchies
#1 using HOBLI and
So, for #1, I will need the entire tree starting from STATE, DISTRICT(s), TALUK(s), HOBLI(s), VILLAGE(s). Using the above tree, I will need
KARNATAKA (State)
|
|
|---TUMKUR (District)
|
|
|-----KUNIGAL(Taluk)
|
|
**|----HULIYUR DURGA(Hobli)**
|
|
|---VODDARAKEMPAPURA(Village)
|
|---Yedehalli(Village)
|
|---Ankanahalli(Village)
#2 using PANCHAYAT.
So, for #2, I will need the entire tree starting from STATE, DISTRICT(s), TALUK(s), PANCHAYAT(s), VILLAGE(s)
KARNATAKA (state)
|
|
|---TUMKUR (District)
|
|
|-----KUNIGAL(Taluk)
|
|
**|----CHOWDANAKUPPE (Panchayat)**
|
|
|---VODDARAKEMPAPURA(Village)
|
|---Ankanahalli(Village)
|
|---Chowdanakuppe(Village)
ResultSet
Should be able to create above Trees with the following details.
We need a PK(non-business KEY) and a SERIAL_NUMBER/ID for each STATE, DISTRICT, TALUK, HOBLI, PANCHAYAT, VILLAGE along with a Name and LEVEL of the relationship(similar to ORACLE'S LEVEL).
For now, getting the above ResultSet is OK. But in the future, we will need an ability to do reporting(some aggregation) at a HOBLI/PANCHAYAT/TALUK level.
=====================================
#Quassnoi #2,
Thank you very much,
"If you are planning to add some more hierarchy axes, it may be worth creating a separate table to store the hierarchies (with the axis field added) rather than adding the fields to the table."
Actually, I simplified the existing requirement so as NOT to confuse anyone. The actual hierarchy is like this
STATE
|
DISTRICT
|
TALUK
/ \
/ \
HOBLI PANCHAYAT
\ /
\ /
\ /
REVENUE VILLAGE
|
|
HABITATION
Sample data for such a hierarchy is like below
KARNATAKA
|
TUMKUR (District)
|
KUNIGAL (Taluk)
/ \
/ \
HULIYUR DURGA(Hobli) CHOWDANAKUPPE(Panchayat)
\ /
\ /
Thavarekere(Revenue Village)
/ \
Bommanahalli(habitation) Tavarekere(Habitation)
Will anything in your solution below change by the above modification ?
Also, would you recommend that I create another Table like below to store the 7 properties of the Habitats ? Is there a better way to store such info ?
CREATE TABLE habitatDetails
(
id BIGINT NOT NULL PRIMARY KEY,
serialNumber BIGINT NOT NULL,
habitatid BIGINT NOT NULL, -- we will add these details only for habitats
CONSTRAINT "habitatdetails_fk" FOREIGN KEY ("habitatid")
REFERENCES "public"."t_hierarchy"("id")
prop1 VARCHAR(128) ,
prop2 VARCHAR(128) ,
prop3 VARCHAR(128) ,
prop4 VARCHAR(128) ,
prop5 VARCHAR(128) ,
prop6 VARCHAR(128) ,
prop7 VARCHAR(128) ,
);
Thank you,
CREATE TABLE t_hierarchy
(
id BIGINT NOT NULL PRIMARY KEY,
type VARCHAR(128) NOT NULL,
name VARCHAR(128) NOT NULL,
tax_parent BIGINT,
gov_parent BIGINT,
CHECK (NOT (tax_parent IS NULL AND gov_parent IS NULL))
);
CREATE INDEX ix_hierarchy_taxparent ON t_hierarchy (tax_parent);
CREATE INDEX ix_hierarchy_govparent ON t_hierarchy (gov_parent);
INSERT
INTO t_hierarchy
VALUES (1, 'State', 'Karnataka', 0, 0),
(2, 'District', 'Tumkur', 1, 1),
(3, 'Taluk', 'Kunigal', 2, 2),
(4, 'Hobli', 'Huliyur Durga', 3, NULL),
(5, 'Panchayat', 'Chowdanakuppe', NULL, 3),
(6, 'Village', 'Voddarakempapura', 4, 5),
(7, 'Village', 'Ankanahalli', 4, 5),
(8, 'Village', 'Chowdanakuppe', 4, 5),
(9, 'Village', 'Yedehalli', 4, 5)
CREATE OR REPLACE FUNCTION fn_hierarchy_tax(level INT, start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT $1, h
FROM t_hierarchy h
WHERE h.id = $2
UNION ALL
SELECT (f).*
FROM (
SELECT fn_hierarchy_tax($1 + 1, h.id) f
FROM t_hierarchy h
WHERE h.tax_parent = $2
) q;
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_tax(start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT fn_hierarchy_tax(1, $1);
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_gov(level INT, start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT $1, h
FROM t_hierarchy h
WHERE h.id = $2
UNION ALL
SELECT (f).*
FROM (
SELECT fn_hierarchy_gov($1 + 1, h.id) f
FROM t_hierarchy h
WHERE h.gov_parent = $2
) q;
$$
LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION fn_hierarchy_gov(start BIGINT)
RETURNS TABLE (level INT, h t_hierarchy)
AS
$$
SELECT fn_hierarchy_gov(1, $1);
$$
LANGUAGE 'sql';
SELECT ht.level, (ht.h).*
FROM fn_hierarchy_tax(1) ht;
SELECT ht.level, (ht.h).*
FROM fn_hierarchy_gov(1) ht;
The main idea is to keep two parents in two different fields, and use CONNECT BY emulation (rather than recursive CTE) functionality to preserve the order.
If you are planning to add some more hierarchy axes, it may be worth creating a separate table to store the hierarchies (with the axis field added) rather than adding the fields to the table.
Update:
Will anything in your solution below change by the above modification?
No, it will work alright.
By "axes" I mean hierarchy chains. Currently, you have two axes: political hierarchy (though hablis) and tax hierarchy (through panchayats). If you are planning to add some more axes (which is of course improbable), you may consider storing the hierarchies in another table and adding "axis" field to that table. Again, it's very improbable that you want to do this, I just mentioned this possibility for the other readers who may have a similar problem.
Also, would you recommend that I create another Table like below to store the 7 properties of the Habitats ? Is there a better way to store such info ?
Yes, keeping them in a separate table is a good idea.