Getting Breadcrumbs in Postgres - postgresql

I have a table that represents a hierarchy by referencing itself.
create table nodes (
id integer primary key,
parent_id integer references nodes (id),
name varchar(255)
);
Given a specific node, I would like to find all of its parents in order, as breadcrumbs. For example, given this data:
insert into nodes (id,parent_id,name) values
(1,null,'Root'),
(2,1,'Left'),
(3,1,'Right'),
(4,2,'LeftLeft'),
(5,2,'LeftRight'),
(6,5,'LeftRightLeft');
If I wanted to start at id=5 I would expect the result to be:
id | depth | name
-- | ----- | ----
1 | 0 | 'Root'
2 | 1 | 'Left'
5 | 2 | 'LeftRight'
I don't care if the depth column is present, but I included it for clarity, to show that there should only be one result for each depth and that results should be in order of depth. I don't care if it's ascending or descending. The purpose of this is to be able to print out some breadcrumbs that look like this:
(1)Root \ (2)Left \ (5)LeftRight

The basic recursive query would look like this:
with recursive tree(id, name, parent_id) as (
select n.id, n.name, n.parent_id
from nodes n
where n.id = 5
union all
select n.id, n.name, n.parent_id
from nodes n
join tree t on (n.id = t.parent_id)
)
select *
from tree;
Demo: http://sqlfiddle.com/#!15/713f8/1
That will give you everything need to rebuild the path from id = 5 back to the root.

Related

Postgres: Query for list of ids in a mapping table and create If they don't exist

Assume we have the following table whose purpose is to autogenerate a numeric id for distinct (name, location) tuples:
CREATE TABLE mapping
(
id bigserial PRIMARY KEY,
name text NOT NULL,
location text NOT NULL,
);
CREATE UNIQUE INDEX idx_name_loc on mapping(name location)
What is the most efficient way to query for a set of (name, location) tuples and autocreate any mappings that don't already exist, with all mappings (including the ones we created) being returned to the user.
My naive implementation would be something like:
SELECT id, name, location
FROM mappings
WHERE (name, location) IN ((name_1, location_1)...(name_n, location_n))
do something with the results in a programming language of may choice to work out which results are missing.
INSERT
INTO mappings (name, location)
VALUES (missing_name_1, missing_loc_1), ... (missing_name_2, missing_loc_2)
ON CONFLICT DO NOTHING
This gets the job done but I get the feeling there's probably something that can a) be done in pure sql and b) is more efficient.
You can use DISTINCT to get all possible values for the two columns, and CROSS JOIN to get their Carthesian product.
LEFT JOIN with the original table to get the actual records (if any):
CREATE TABLE mapping
( id bigserial PRIMARY KEY
, name text NOT NULL
, location text NOT NULL
, UNIQUE (name, location)
);
INSERT INTO mapping(name, location) VALUES ('Alice', 'kitchen'), ('Bob', 'bedroom' );
SELECT * FROM mapping;
SELECT n.name, l.location, m.id
FROM (SELECT DISTINCT name from mapping) n
CROSS JOIN (SELECT DISTINCT location from mapping) l
LEFT JOIN mapping m ON m.name = n.name AND m.location = l.location
;
Results:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 2
id | name | location
----+-------+----------
1 | Alice | kitchen
2 | Bob | bedroom
(2 rows)
name | location | id
-------+----------+----
Alice | kitchen | 1
Alice | bedroom |
Bob | kitchen |
Bob | bedroom | 2
(4 rows)
And if you want to physically INSERT the missing combinations:
INSERT INTO mapping(name, location)
SELECT n.name, l.location
FROM (SELECT DISTINCT name from mapping) n
CROSS JOIN (SELECT DISTINCT location from mapping) l
WHERE NOT EXISTS(
SELECT *
FROM mapping m
WHERE m.name = n.name AND m.location = l.location
)
;
SELECT * FROM mapping;
INSERT 0 2
id | name | location
----+-------+----------
1 | Alice | kitchen
2 | Bob | bedroom
3 | Alice | bedroom
4 | Bob | kitchen
(4 rows)

Postgres, get two row values that are both linked to the same ID

I have a rather tricky database problem that has really stumped me, would appreciate any help.
I have a table which includes data from multiple different sources. This data from different sources can be ‘duplicated’ and we have ways of identifying if that is the case.
Each row in the table has an ‘id’, and if it is identified as a duplicate of another row then we merge it, and it is given a ‘merged_into_id’ which refers to another row in the same table.
I am trying to run a report which will return information about where we have identified duplicates from two of those different sources.
Lets say I have three sources: A, B and C. I want to identify all of the duplicate rows between source A and source B.
I have got the query working fine to do this if a row from source A is directly merged into source B. However, we also have instances in the DB where source A row AND source B row are merged into source C. I am struggling with these and was hoping someone could help with that.
An example:
Original DB:
id
source
merged_into_id
1
A
3
2
B
3
3
C
NULL
What I would like to do is to be able to return id 1 and id 2 from that table, as they are both merged into the same ID e.g. like so:
source_a_id
source_b_id
1
2
But I'm really struggling to get to that - all I've managed to do is create a parent and child link like the following:
parent_id
child_id
child_source
3
1
A
3
2
B
I can also return just the IDs that I want, but they don't 'join' so to speak:
e.g.
SELECT
CASE WHEN child_source = 'A' then child_id as source_a_id,
CASE WHEN child_source = 'B' then child_id as source_b_id
But that just gives me a response with an empty row for the 'missing' data
---EDIT---
Using array_agg and array_to_string I've gotten a little closer to what I need:
SELECT
parent.id as parent_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'A' THEN child.id END)
, ','
) a_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'B' THEN child.id END)
, ','
) b_id
but its not quite the right format as I can occasionally have multiple versions from each source, so I get a table that looks like :
parent_id
a_id
b_id
3
1
2,4,5
In this case, I want to return a table that looks like:
parent_id
a_id
b_id
3
1
2
3
1
4
3
1
5
Does anyone have any advice on getting to my desired output? Many thanks
Suppose that we have this table
select * from t;
id | source | merged_into_id
----+--------+----------------
1 | A | 3
2 | B | 3
3 | C |
5 | B | 3
4 | B | 3
(5 rows)
This should do the work
WITH B_source as (select * from t where source = 'B'),
A_source as (select * from t where source = 'A')
SELECT merged_into_id,A_source.id as a_id,B_source.id as b_id
FROM A_source
INNER JOIN B_source using (merged_into_id);
Result
merged_into_id | a_id | b_id
----------------+------+------
3 | 1 | 2
3 | 1 | 5
3 | 1 | 4
(3 rows)

Find all multipolygons from one table within another

So, I've got two tables - PLUTO (pieces of land), and NYZMA (rezoning boundaries). They look like:
pluto nyzma
id | geom name | geom
-------------------- -------------------
1 | MULTIPOLYGON(x) A | MULTIPOLYGON(a)
2 | MULTIPOLYGON(y) B | MULTIPOLYGON(b)
And I want it to spit out something like this, assuming that PLUTO record 1 is in multipolygons A and B, and PLUTO record 2 is in neither:
pluto_id | nyzma_id
-------------------
1 | [A, B]
2 |
How do I, for every PLUTO record's corresponding geometry, cycle through each NYZMA record, and print the names of any whose geometry matches?
Join the two tables using the spatial function ST_Contains. Than use GROUP BY and ARRAY_AGG in the main query:
WITH subquery AS (
SELECT pluto.id, nyzma.name
FROM pluto LEFT OUTER JOIN nyzma
ON ST_Contains(nyzma.geom, pluto.geom)
)
SELECT id, array_agg(name) FROM subquery GROUP BY id;

How to find the last descendant (that matches other criteria) in a linear “ancestor-descendant” relationship

This question is based on the following question, but with an additional requirement: PostgreSQL: How to find the last descendant in a linear "ancestor-descendant" relationship
Basically, what I need is a Postgre-SQL statement that finds the last descendant in a linear “ancestor-descendant” relationship that matches additional criteria.
Example:
Here the content of table "RELATIONSHIP_TABLE":
id | id_ancestor | id_entry | bool_flag
---------------------------------------
1 | null | a | false
2 | 1 | a | false
3 | 2 | a | true
4 | 3 | a | false
5 | null | b | true
6 | null | c | false
7 | 6 | c | false
Every record within a particular hierarchy has the same "id_entry"
There are 3 different “ancestor-descendant” relationships in this example:
1. 1 <- 2 <- 3 <- 4
2. 5
3. 6 <- 7
Question PostgreSQL: How to find the last descendant in a linear "ancestor-descendant" relationship shows how to find the last record of each relationship. In the example above:
1. 4
2. 5
3. 7
So, what I need this time is the last descendant by "id_entry" whose "bool_flag" is set to true. In the example above:
1. 3
2. 5
3. <empty result>
Does anyone know a solution?
Thanks in advance :)
QStormDS
Graphs, trees, chains, etc represented as edge lists are usually good uses for recursive common table expressions - i.e. WITH RECURSIVE queries.
Something like:
WITH RECURSIVE walk(id, id_ancestor, id_entry, bool_flag, id_root, generation) AS (
SELECT id, id_ancestor, id_entry, bool_flag, id, 0
FROM RELATIONSHIP_TABLE
WHERE id_ancestor IS NULL
UNION ALL
SELECT x.id, x.id_ancestor, x.id_entry, x.bool_flag, walk.id_root, walk.generation + 1
FROM RELATIONSHIP_TABLE x INNER JOIN walk ON x.id_ancestor = walk.id
)
SELECT
id_entry, id_root, id
FROM (
SELECT
id, id_entry, bool_flag, id_root, generation,
max(CASE WHEN bool_flag THEN generation END ) OVER w as max_enabled_generation
FROM walk
WINDOW w AS (PARTITION BY id_root ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
) x
WHERE generation = max_enabled_generation;
... though it feels like there really should be a better way to do this than tracking how many generations we've walked down each path.
If id_entry is common for all members of a tree, you can avoid needing to track id_root. You should create a UNIQUE constraint on (id_entry, id) and a foreign key constraint on FOREIGN KEY (id_entry, id_ancestor) REFERENCES (id_entry, id) to make sure that the ordering is consistent, then use:
WITH RECURSIVE walk(id, id_ancestor, id_entry, bool_flag, generation) AS (
SELECT id, id_ancestor, id_entry, bool_flag, 0
FROM RELATIONSHIP_TABLE
WHERE id_ancestor IS NULL
UNION ALL
SELECT x.id, x.id_ancestor, x.id_entry, x.bool_flag, walk.generation + 1
FROM RELATIONSHIP_TABLE x INNER JOIN walk ON x.id_ancestor = walk.id
)
SELECT
id_entry, id
FROM (
SELECT
id, id_entry, bool_flag, generation,
max(CASE WHEN bool_flag THEN generation END ) OVER w as max_enabled_generation
FROM walk
WINDOW w AS (PARTITION BY id_entry ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
) x
WHERE generation = max_enabled_generation;
Since this gives you a table of final descendents matched up with root parents, you can just filter with a regular WHERE clause now, just append AND bool_flag. If you instead want to exclude chains that have bool_flag set to false at any point along the way, you can add WHERE bool_value in the RECURSIVE query's join.
SQLFiddle example: http://sqlfiddle.com/#!12/92a64/3
WITH RECURSIVE tail AS (
SELECT id AS opa
, id, bool_flag FROM boolshit
WHERE bool_flag = True
UNION ALL
SELECT t.opa AS opa
, b.id, b.bool_flag FROM boolshit b
JOIN tail t ON b.id_ancestor = t.id
)
SELECT *
FROM boolshit bs
WHERE bs.bool_flag = True
AND NOT EXISTS (
SELECT * FROM tail t
WHERE t.opa = bs.id
AND t.id <> bs.id
AND t.bool_flag = True
);
Explanation: select all records that have the bool_flag set,
EXCEPT those that have offspring (direct or indirect) that have the bool_flag set, too. This effectively picks the last record of the chain that has the flag set.

how to make array_agg() work like group_concat() from mySQL

So I have this table:
create table test (
id integer,
rank integer,
image varchar(30)
);
Then some values:
id | rank | image
---+------+-------
1 | 2 | bbb
1 | 3 | ccc
1 | 1 | aaa
2 | 3 | c
2 | 1 | a
2 | 2 | b
I want to group them by id and concatenate the image name in the order given by rank. In mySQL I can do this:
select id,
group_concat( image order by rank asc separator ',' )
from test
group by id;
And the output would be:
1 aaa,bbb,ccc
2 a,b,c
Is there a way I can have this in postgresql?
If I try to use array_agg() the names will not show in the correct order and apparently I was not able to find a way to sort them. (I was using postgres 8.4 )
In PostgreSQL 8.4 you cannot explicitly order array_agg but you can work around it by ordering the rows passed into to the group/aggregate with a subquery:
SELECT id, array_to_string(array_agg(image), ',')
FROM (SELECT * FROM test ORDER BY id, rank) x
GROUP BY id;
In PostgreSQL 9.0 aggregate expressions can have an ORDER BY clause:
SELECT id, array_to_string(array_agg(image ORDER BY rank), ',')
FROM test
GROUP BY id;