Validating LTREE hierarchies in PostgreSQL - postgresql

I'm new to hierarchies in general and LTREE in particular. As I've been converting and loading a column of text-based hierarchies into an LTREE column, I noticed a poorly-formatted string.
create table test_tree(id int, path ltree);
insert into test_tree values (1, '1');
insert into test_tree values (1, '1.1');
insert into test_tree values (1, '1.2.0'); --should be '1.2'
insert into test_tree values (1, '1.2.1');
insert into test_tree values (1, '1.2.2.0'); --should be '1.2.2'
insert into test_tree values (1, '1.2.2.1');
insert into test_tree values (1, '1.2.2.2');
This results in some unexpected behavior.
select path from test_tree where path <# '1';
returns descendants, i.e.:
1
1.1
1.2.0
1.2.1
1.2.2.0
1.2.2.1
1.2.2.2
Whereas:
select path from test_tree where path #> '1.2.2.2';
only returns
1.2.2.2
I would expect <# '1' to returns results consistent with #> '1.2.2.2'. In this instance how can an ancestor know its descendants, but a descendant not know its ancestors? Why does <# '1' return all offspring (seemingly ignoring the missing '1.2.2') but #> '1.2.2.2' return no ancestors?
Moreover, how can I find these missing relationships in LTREE datatypes?

The ltree operators do not care what values you have in your table or not. They only compare two ltree values. '1' #> '1.2.2.2' is true, '1.2.2.2' #> '1.2.2.2' is true, '1.1' #> '1.2.2.2' is not.
But a SELECT query returns only the rows that actually exist in your table. '1.2' #> '1.2.2.2' and '1.2.2' #> '1.2.2.2' would have been true as well, however those two values do not exist in your table, so they cannot be found. The #>/<# operators do not construct new rows.
To actually construct all possible ancestors of an ltree value, not just those that are part of your table, you can use
SELECT subpath(p, 0, generate_series(1, nlevel(p)))
(online demo)
You also seem to have assumed an implicit constraint from using ltree columns that the parent value exists in the same column of the table. This is not possible, in relational databases rows are independent from each other: an ltree value is not a reference to another row plus the last label, it really is just a list of labels; every row in your table stores the complete label path. Using a specific type for a column cannot introduce a constraint, you'd have to do that yourself - either as a complicated foreign key from a generated column, or using a trigger.
How can I find these missing relationships in LTREE datatypes?
You can find such missing rows in your table (not in the datatype) using
SELECT path, array_agg(id) AS required_by
FROM (
SELECT id, subpath(path, 0, generate_series(1, nlevel(path)-1)) AS path
FROM test_tree
ORDER BY path, id
) AS all_parent_paths
WHERE NOT EXISTS (SELECT * FROM test_tree WHERE path = all_parent_paths.path)
GROUP BY path
(online demo)

Related

How to insert values from a select query

how do I insert the std_id value and sub_id value in the student_subject table
insert into student_subjects(student_id,subject_id)
values(std_id,(select id from subjects
where guid in
(select * from
unnest(string_to_array(subjects_colls,',')::uuid[])))::int);
ERROR: more than one row returned by a subquery used as an expression
Get rid of the values clause and use the SELECT directly as the source for the INSERT statement:
You also don't need to unnest your array, using = any() will be a bit more efficient (although I would recommend you do not pass comma separated strings, but an array of uuid directly)
insert into student_subjects(student_id,subject_id)
select std_id, s.id
from subjects s
where guid = any(string_to_array(subjects_colls,',')::uuid[])
I assume this is part of a procedure or function and std_id and subjects_colls are parameters passed to it.

Output Inserted.id equivalent in Postgres

I am new to PostgreSQL and trying to convert mssql scripts to Postgres.
For Merge statement, we can use insert on conflict update or do nothing but am using the below statement, not sure whether it is the correct way.
MSSQL code:
Declare #tab2(New_Id int not null, Old_Id int not null)
MERGE Tab1 as Target
USING (select * from Tab1
WHERE ColumnId = #ID) as Source on 0 = 1
when not matched by Target then
INSERT
(ColumnId
,Col1
,Col2
,Col3
)
VALUES (Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
)
OUTPUT INSERTED.Id, Source.Id into #tab2(New_Id, Old_Id);
Postgres Code:
Create temp table tab2(New_Id int not null, Old_Id int not null)
With source as( select * from Tab1
WHERE ColumnId = ID)
Insert into Tab1(ColumnId
,Col1
,Col2
,Col3
)
select Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
from source
My query is how to convert OUTPUT INSERTED.Id in postgres.I need this id to insert records in another table (lets say as child tables based on Inserted values in Tab1)
In PostgreSQL's INSERT statements you can choose what the query should return. From the docs on INSERT:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned.
Example (shortened form of your query):
WITH [...] INSERT INTO Tab1 ([...]) SELECT [...] FROM [...] RETURNING Tab1.id

CTE based insert of multiple rows into "one-per-group" table violates unique index

I have a table where only one row per group can be true.
This is enforced by a partial unique index (which can't be deferred).
CREATE TABLE test
(
id SERIAL PRIMARY KEY,
my_group INTEGER,
last BOOLEAN DEFAULT TRUE
);
CREATE UNIQUE INDEX "test.last" ON test (my_group) WHERE last;
INSERT INTO test (my_group)
VALUES (1), (2);
I'm trying to insert a new row into this table that shall replace the "last" element of the corresponding group. I also want to accomplish this in a single statement.
With some CTE trickery I'm able to do this: link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1)
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
So far so good, the insert happens... no conflicts.
I don't quite understand why this is working as from my understanding all CTEs should read the same initial DB state and can't see the changes made by other CTEs
The problem is now that I get a unique violation when I try to do the same with multiple rows at once: Link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1), (2) -- <- difference to above query
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
-- Schema Error: error: duplicate key value violates unique constraint "test.last"
Is there any way to insert multiple rows with one statement /Can someone explain to me why the first query is working and the second isn't?
This was caused by PostgreSQL simplifying my always true clause:
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true)
was supposed to create a dependency between the main query and the CTE to enforce execution order from the main query's point of view.
It broke with more than one entry because the limit 1 allowed PostgreSQL to ignore the second row, as only one was required for evaluation.
I fixed it by comparing COUNT(*) > -1 instead:
COALESCE((SELECT COUNT(*) FROM uncheck_old_last) > -1, true)

Why does postgres group null values?

CREATE TEMP TABLE wirednull (
id bigint NOT NULL,
value bigint,
CONSTRAINT wirednull_pkey PRIMARY KEY (id)
);
INSERT INTO wirednull (id,value) VALUES (1,null);
INSERT INTO wirednull (id,value) VALUES (2,null);
SELECT value FROM wirednull GROUP BY value;
Returns one row, but i would expect two rows since
SELECT *
FROM wirednull a
LEFT JOIN wirednull b
ON (a.value = b.value)
does not find any joins, because null!=null in postgres
According to SQL wikipedia :
When two nulls are equal: grouping, sorting, and some set operations
Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct".[20] This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used.
This wasn't the question:
Because null = null or something = null return unknown not true/false
So:
ON (a.value = b.value)
Doesn't match.

Use result of postgres CTE in function

I am having difficulty using the results from a CTE in a function. Given the following Postgres table.
CREATE TABLE directory (
id SERIAL PRIMARY KEY
, name TEXT
, parent_id INTEGER REFERENCES directory(id)
);
INSERT INTO directory (name, parent_id)
VALUES ('Root', NULL), ('D1', 1), ('D2', 2), ('D3', 3);
I have this recursive CTE that returns the descendants of a directory.
WITH RECURSIVE tree AS (
SELECT id
FROM directory
WHERE parent_id = 2
UNION ALL
SELECT directory.id
FROM directory, tree
WHERE directory.parent_id = tree.id
)
The returned values are what I expect and can be made to equal an array
SELECT (SELECT array_agg(id) FROM tree) = ARRAY[3, 4];
I can use an array to select values from the table
SELECT * FROM directory WHERE id = ANY(ARRAY[3, 4]);
However, I cannot use the results of the CTE to accomplish the same thing.
SELECT * FROM directory WHERE id = ANY(SELECT array_agg(id) FROM tree);
The resulting error indicates that there is a type mismatch.
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
However, I am unsure how to correctly accomplish this.
Use:
SELECT *
FROM directory
WHERE id = ANY(SELECT unnest(array_agg(id)) FROM tree);
See detailed explanation in this answer.
Using unnest() in a subquery is a general method for dealing with arrays:
where id = any(select unnest(some_array))
Because array_agg() and unnest() are inverse operations, the query can be as simply as:
SELECT *
FROM directory
WHERE id = ANY(SELECT id FROM tree);