search recursively for dead-ends in topological network table - postgresql

I've been trying for weeks to figure this out:
I need to recursively search a topological network, OpenStreetMap streets in this case, for dead ends, and neighborhoods that hang from the rest of the network by only one edge. These are places where you might expect to see a no-exit sign if your city is considerate like that.
My table has a record for each edge in the network. Each edge has a 'target' and 'source' field, identifying the node to which that side of the edge is connected. I've added a binary column called 'dangling' to indicate whether the edge has been identified as a dea-ending segment. I initialize this column as FALSE, assuming the best.
So far, I've been able to get to identify simply branching dead-ends with the following SQL
WITH node_counts AS ( -- get all unique nodes
SELECT target AS node FROM edge_table WHERE NOT dangling
UNION ALL
SELECT source AS node FROM edge_table WHERE NOT dangling),
single_nodes AS ( -- select only those that occur once
SELECT node
FROM node_counts
GROUP BY node
HAVING count(*) = 1
) --
UPDATE edge_table SET dangling = true
FROM single_nodes
WHERE node = target OR node = source;
I simply keep running this query until no rows are updated.
The result looks like this(red is dangling = true):
http://i.stack.imgur.com/OE1rZ.png
Excellent! This is working great...but there are still cul-de-sac neighborhoods if you will, which are only connected to the larger network by one edge. How can I identify those?
My best guess is that I'm going to need a WITH RECURSIVE at some point, but that's about as far as my unmathmatical mind will go. Can anyone point me in the right direction?

OK. Here's how I figured it out:
I decided that there was not a way, or least not an easy way to implement this in SQL alone. I ended up implementing Tarjan's Algorithm in PHP and SQL, creating a temporary nodes table which linked each node to a strongly connected subcomponent of the graph. Once that was done, I updated any segment that was touching a node which did not belong to the largest subcomponent, as 'dangling'. All edges therefor that started and ended at nodes belonging to the largest subcomponent belong to the main street network (not dangling).
Here's the code. Note that it can take a very long time to run on a large graph. It's also pretty hard on the working memory, but it worked for my purposes.
<?php
$username = '';
$password = '';
$database = '';
$edge_table = 'cincy_segments';
$v1 = 'target';
$v2 = 'source';
$dangling_boolean_field = 'dangling';
$edge_id_field = 'edge_id';
//global variables declared
$index = 0;
$component_index = 0;
$nodes = array();
$stack = array();
pg_connect("host=localhost dbname=$database user=$username password=$password");
// get vertices
echo "getting data from database\n";
$neighbors_query = pg_query("
WITH nodes AS (
SELECT DISTINCT $v1 AS node FROM $edge_table
UNION
SELECT DISTINCT $v2 AS node FROM $edge_table
),
edges AS (
SELECT
node,
$edge_id_field AS edge
FROM nodes JOIN $edge_table
ON node = $v1 OR node = $v2
)
SELECT
node,
array_agg(CASE WHEN node = $v2 THEN $v1
WHEN node = $v1 THEN $v2
ELSE NULL
END) AS neighbor
FROM edges JOIN $edge_table ON
(node = $v2 AND edge = $edge_id_field) OR
(node = $v1 AND edge = $edge_id_field)
GROUP BY node");
// now make the results into php results
echo "putting the results in an array\n";
while($r = pg_fetch_object($neighbors_query)){ // for each node record
$nodes[$r->node]['id'] = $r->node;
$nodes[$r->node]['neighbors'] = explode(',',trim($r->neighbor,'{}'));
}
// create a temporary table to store results
pg_query("
DROP TABLE IF EXISTS temp_nodes;
CREATE TABLE temp_nodes (node integer, component integer);
");
// the big traversal
echo "traversing graph (this part takes a while)\n";
foreach($nodes as $id => $values){
if(!isset($values['index'])){
tarjan($id, 'no parent');
}
}
// identify dangling edges
echo "identifying dangling edges\n";
pg_query("
UPDATE $edge_table SET $dangling_boolean_field = FALSE;
WITH dcn AS ( -- DisConnected Nodes
-- get nodes that are NOT in the primary component
SELECT node FROM temp_nodes WHERE component != (
-- select the number of the largest component
SELECT component
FROM temp_nodes
GROUP BY component
ORDER BY count(*) DESC
LIMIT 1)
),
edges AS (
SELECT DISTINCT e.$edge_id_field AS disconnected_edge_id
FROM
dcn JOIN $edge_table AS e ON dcn.node = e.$v1 OR dcn.node = e.$v2
)
UPDATE $edge_table SET $dangling_boolean_field = TRUE
FROM edges WHERE $edge_id_field = disconnected_edge_id;
");
// clean up after ourselves
echo "cleaning up\n";
pg_query("DROP TABLE IF EXISTS temp_nodes;");
pg_query("VACUUM ANALYZE;");
// the recursive function definition
//
function tarjan($id, $parent)
{
global $nodes;
global $index;
global $component_index;
global $stack;
// mark and push
$nodes[$id]['index'] = $index;
$nodes[$id]['lowlink'] = $index;
$index++;
array_push($stack, $id);
// go through neighbors
foreach ($nodes[$id]['neighbors'] as $child_id) {
if ( !isset($nodes[$child_id]['index']) ) { // if neighbor not yet visited
// recurse
tarjan($child_id, $id);
// find lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['lowlink']
);
} else if ($child_id != $parent) { // if already visited and not parent
// assess lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['index']
);
}
}
// was this a root node?
if ($nodes[$id]['lowlink'] == $nodes[$id]['index']) {
do {
$w = array_pop($stack);
$scc[] = $w;
} while($id != $w);
// record results in table
pg_query("
INSERT INTO temp_nodes (node, component)
VALUES (".implode(','.$component_index.'),(',$scc).",$component_index)
");
$component_index++;
}
return NULL;
}
?>

IMO it is not possible without loop-detection. (the dangling bit is a kind of breadcrum-loopdetection). The below query is a forking Y-shape leading into two dead-end-streets (1..4 and 11..14).
If you add the link between #19 back to #15, the recursion will not stop. (Maybe my logic is incorrect or incomplete?)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE edge_table
( source INTEGER NOT NULL
, target INTEGER NOT NULL
, dangling boolean NOT NULL DEFAULT False
);
INSERT INTO edge_table ( source, target) VALUES
(1,2) ,(2,3) ,(3,4)
,(11,12) ,(12,13) ,(13,14)
,( 15,16) ,(16,17) ,(17,18) ,( 18,19)
-- , (19,15) -- this will close the loop
, (19,1) -- Y-fork
, (19,11) -- Y-fork
;
-- EXPLAIN
WITH RECURSIVE cul AS (
SELECT e0.source AS source
, e0.target AS target
FROM edge_table e0
WHERE NOT EXISTS ( -- no way out ...
SELECT * FROM edge_table nx
WHERE nx.source = e0.target
)
UNION ALL
SELECT e1.source AS source
, e1.target AS target
FROM edge_table e1
JOIN cul ON cul.source = e1.target
WHERE 1=1
AND NOT EXISTS ( -- Only one incoming link; no *other* way to cul
SELECT * FROM edge_table nx
WHERE nx.target = cul.source
AND nx.source <> e1.source
)
)
SELECT * FROM cul
;
[ the CTE is of course intended to be used in an update statement to set the dangling fields ]

Related

Selecting parent records by filtering multiple fields of collection of links

I have been trying to figure out this for a couple of days know but I can't come up with a query that gives me the correct results. The essence of the task is that I am trying to retrieve all the nodes of a graph that have children with attributes that satisfy multiple constraints. The issue I have is that a node may have multiple linked nodes and when I try to apply criteria to restrict which nodes must be returned by the query the criteria need to be imposed against sets of nodes instead of individual nodes.
Let me explain the problem in more detail through an example. Here is a sample schema of companies and locations. Each company can have multiple locations.
create class company extends V;
create property company.name STRING;
create class location extends V;
create property location.name STRING;
create property location.type INTEGER;
create property location.inactive STRING;
Let me now create a couple of records to illustrate the problem I have.
create vertex company set name = 'Company1';
create vertex location set name = 'Location1', type = 5;
create vertex location set name = 'Location2', type = 7;
create edge from (select from company where name = 'Company1') to (select from location where name in ['Location1', 'Location2']);
create vertex company set name = 'Company2';
create vertex location set name = 'Location3', type = 6;
create vertex location set name = 'Location4', type = 5, inactive = 'Y';
create edge from (select from company where name = 'Company2') to (select from location where name in ['Location3','Location4']);
I want to retrieve all companies that either don't have a location of type 5 or have a location of type 5 that is inactive (inactive = 'Y'). The query that I tried initially is shown below. It doesn't work because the $loc.type is evaluated against a collection of values instead of a individual record so the is null is not applied against the individual field 'inactive' of each location record but against the collection of values of the field 'inactive' for each parent record. I tried sub-queries, the set function, append and so on but I can't get it to work.
select from company let loc = out() where $loc.type not contains 5 or ($loc.type contains 5 and $loc.type is null)
You can try with this query:
select expand($c)
let $a = ( select expand(out) from E where out.#class = "company" and in.#class="location" and in.type = 5 and in.inactive = "Y" ),
$b = ( select from company where 5 not in out("E").type ),
$c = unionAll($a,$b)
UPDATE
I have created this graph
You can use this query
select expand($f)
let $a = ( select from E where out.#class = "company" and in.#class="location" ),
$b = ( select expand(out) from $a where in.type = 5 and in.inactive = "Y"),
$c = ( select expand(out) from $a where in.type = 5 and in.inactive is null),
$d = difference($b,$c),
$e = ( select from company where 5 not in out("E").type ),
$f = unionAll($d,$e)
Hope it helps.
Try this query:
select expand($d) from company
let $a=(select from company where out().type <> 5 and name contains $parent.current.name),
$b=(select from company where out().type contains 5 and name=$parent.current.name),
$c=(select from company where out().inactive contains "Y" and name=$parent.current.name),
$d=unionall($a,intersect($b,$c))
Hope it helps,
Regards,
Michela

FMP 14 - Auto Populate a Field based on a calculation

I am using FMP 14 and would like to auto-populate field A based on the following calulation:
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v1" ; "1st" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v2" ; "2nd" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v3" ; "3rd" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v4" ; "4th" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v5" ; "5th" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v6" ; "6th" )
The above code is supposed to auto-populate the value 1st, 2nd, 3rd ... in field A depending on the name of the object the Get (ActiveLayoutObjectName) function returns. I have named each of my tabs, but the calculation is only returning 0.
Any help would be appreciated.
thanks.
The way your calculation is written makes very little sense. Each one of the If() statements returns a result of either "1st", "2nd", etc. or nothing (an empty string). You are then applying or to all these results. Since only of them can be true, your calculation is essentially doing something like:
"" or "2nd" or "" or "" or "" or ""
which happens to return 1 (true), but has no useful meaning.
You should be using the Case() function here:
Case (
Get ( ActiveLayoutObjectName ) = "tab_Visits_v1" ; "1st" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v2" ; "2nd" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v3" ; "3rd" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v4" ; "4th" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v5" ; "5th" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v6" ; "6th"
)
Note also that a calculation field may not always refresh itself as a result of user switching a tab. This refers to an unstored calculation field; if you are trying to use this as the formula to be auto-entered into a "regular' (e.g. Text) field, it will never update.
Added:
Here is our situation. We see a patient a maximum of 6 times. We have
a tab for each one of those 6 visits.
I would suggest you use a portal to a related table of Visits instead of a tab control. A tab control is designed to display fixed components of user interface - not dynamic data. And certainly not data split into separate records. You should have only one unique record for each patient - and as many records for each patient's visits as may be necessary (potentially unlimited).
If you like, you can use the portal rows as buttons to select a specific visit to view in more detail (similar to a tab control, except that the portal shows the "tabs" as vertical rows). A one-row portal to the same Visits table, filtered by the user selection, would work very well for this purpose, I believe.
With 1. .... it would be easy:
Right (Get ( ActiveLayoutObjectName ) ; 1) & "."
Thanks for pointing out, that my first version does not work.

How to write a query that filters traversed elements

I'm struggling with the following query. For a family tree database, I have a vertex 'Person' and a lightweight edge 'Child', so the edge would go out of a parent and into a child (ie 'child-of'). From a person, I need to get their siblings who share the exact same parents.
I can get all of a persons siblings fairly easy, as follows;
SELECT
FROM (
TRAVERSE out_Child
FROM (
SELECT expand(in_Child)
FROM #11:3
)
WHILE $depth <= 1
)
WHERE $depth = 1
So this gets the parents of the person in question, then gets all the children of the parents. The results might look like the following
#rid in_Child
#11:2 #11:0
#11:3 #11:0, #11:1
#11:4 #11:0, #11:1
#11:5 #11:1
I need to filter these results though, as I only want records that have the exact same parents as #11:3. So in this instance, the query should only return #11:3 and #11:4. If the query were for #11:5, it should return #11:5 only. So basically, the in_Child fields must be the same.
I've tried all sorts of queries such as the following, but the query either doesnt run or doesnt filter.
SELECT
FROM (
SELECT
FROM (
TRAVERSE out_Child
FROM (
SELECT expand(in_Child)
FROM #11:3
)
WHILE $depth <= 1
)
WHERE $depth = 1
)
LET $testinChild = (SELECT expand(in_Child) FROM #11:3)
WHERE in_Child CONTAINSALL $testinChild
Ultimately I would prefer to not do any sub-queries, but if it's required then so be it. I Also tried to use traversedElement(0) function, but it only returns the first record traversed (ie #11:0, but not #11:1), so it can't be used.
Update;
If you copy-paste the following into orientdb console (change the password etc to suit your setup), you will have the same dataset described above.
create database remote:localhost/persondb root pass memory graph
alter database custom useLightweightEdges=true
create class Person extends V
create property Person.name string
create class Child extends E
create vertex Person set name = "Father"
create vertex Person set name = "Mother"
create vertex Person set name = "Child of father only"
create edge Child from #11:0 to #11:2
create vertex Person set name = "Child of father+mother #1"
create edge Child from #11:0 to #11:3
create edge Child from #11:1 to #11:3
create vertex Person set name = "Child of father+mother #2"
create edge Child from #11:0 to #11:4
create edge Child from #11:1 to #11:4
create vertex Person set name = "Child of mother only"
create edge Child from #11:1 to #11:5
Okay, I've found some solutions.
First of all, the way I used CONTAINSALL in the question is not correct, as pointed out to me here. CONTAINSALL does not check that all the items on the 'right' are in the 'left', but actually loops over each item in the 'left' and uses that item in the expression on the 'right'. SO WHERE in_Child CONTAINSALL (sex = 'Male) will filter for records where all of the in_Child records are only Male (ie no females). It's basically checking that in_Child[0:n].sex = 'Male'.
So I tried this query;
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
WHERE
(SELECT expand(in('Child')) from #11:3) CONTAINSALL (#rid in $current.in_Child)
I think OrientDB might have a bug here. The above query return #11:2, #11:3 and #11:4, which doesn't make sense to me. I changed this query slightly...
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
$parents CONTAINSALL (#rid in $current.in_Child)
This works better. The above query correctly returns #11:3 and #11:4, but a query on #11:2 or #11:5 also incorrectly includes both #11:3 and #11:4. This makes sense, because it checking the parent rids of eg #11:2 (which is only 1) is in the parents of the rest, which they are. So I added a check to ensure they had the same amount of parents.
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
$parents CONTAINSALL (#rid in $current.in_Child)
AND
$parents.size() = in('Child').size()
Now the query is working correctly for all instances. However, I still wasn't happy with this query. I abandonned the use of CONTAINSALL and eventually came up with the following...
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
in_Child.asSet() = $parents.asSet()
This appears the best/safest, and is the one I will use.
UPDATE for dynamic number of parents :
SELECT
distinct(#rid)
FROM
(SELECT
expand(intersect)
FROM
(SELECT
in('Child').out('Child') as intersect
FROM
#17:2))
WHERE
in('Child').size() = $parentCount.size[0]
LET $parentCount = (SELECT
in('Child').size() as size
FROM
#17:2)

Additional conditions in JOIN

I have tables with articles and users, both have many-to-many mapping to third table - reads.
What I am trying to do here is to get all unread articles for particular user ( user_id not present in table reads ).
My query is getting all articles but those read are marked, which if fine as I can filter them out (user_id field contains id of user in question).
I have an SQL query like this:
SELECT articles.id, reads.user_id
FROM articles
LEFT JOIN
reads
ON articles.id = reads.article_id AND reads.user_id = 9
ORDER BY articles.last_update DESC LIMIT 5;
Which yields following:
articles.id | reads.user_id
-------------------+-----------------
57125839 | 9
57065456 |
56945065 |
56945066 |
56763090 |
(5 rows)
This is fine. This is what I want.
I'd like to get same result in Catalyst using my article model, but I cannot find any option to add conditions to a JOIN clause.
Do you know any way how to add AND X = Y to DBIx JOIN?
I know this can be done with custom resoult source and virtual view, but I have some other queries that could benefit from it and I'd like to avoid creating virtual view for each of them.
Thanks,
Canto
I don't even know what Catalyst is but I can hack the SQL query:
select articles.id, reads.user_id
from
articles
left join
(
select *
from reads
where user_id = 9
) reads on articles.id = reads.article_id
order by articles.last_update desc
limit 5;
I got an solution.
It's not straight forward, but it's better than virtual view.
http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Relationship/Base.pm#condition
Above describes how to use conditions in JOIN clause.
However, my case needs an variable in those conditions, which is not available by default in model.
So getting around a bit of model concept and introducing variable to it, we have the following.
In model file
our $USER_ID;
__PACKAGE__->has_many(
pindols => "My::MyDB::Result::Read",
sub {
my $args = shift;
die "no user_id specified!" unless $USER_ID;
return ({
"$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.article_id" },
"$args->{foreign_alias}.user_id" => { -ident => $USER_ID },
});
}
);
in controller
$My::MyDB::Result::Article::USER_ID = $c->user->id;
$articles = $channel->search(
{ "pindols.user_id" => undef } ,
{
page => int($page),
rows => 20,
order_by => 'last_update DESC',
prefetch => "pindols"
}
);
Will fetch all unread articles and yield following SQL.
SELECT me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail, pindols.article_id, pindols.user_id FROM (SELECT me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail FROM articles me LEFT JOIN reads pindols ON ( me.id = pindols.article_id AND pindols.user_id = 9 ) WHERE ( pindols.user_id IS NULL ) GROUP BY me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail ORDER BY last_update DESC LIMIT ?) me LEFT JOIN reads pindols ON ( me.id = pindols.article_id AND pindols.user_id = 9 ) WHERE ( pindols.user_id IS NULL ) ORDER BY last_update DESC: '20'
Of course you can skip the paging but I had it in my code so I included it here.
Special thanks goes to deg from #dbix-class on irc.perl.org and https://blog.afoolishmanifesto.com/posts/dbix-class-parameterized-relationships/.
Thanks,
Canto

OrientDB Removing one result set from another using the difference() function

We are using version v.1.7-rc2 of OrientDB, embedded in our application, and I'm struggling to figure out a query for removing one set of results from another set of results.
For a simplified example, we have a class of type "A" which is organized in a directional hierarchy. The class has a "name" attribute defined as a string (referring to areas, regions, counties, cities, etc), and a "parent" edge defining a relationship from the child instances to the parent instances.
I was able to find the intersection of the result sets from the two sub-queries of my hierarchy using the instance() function:
select expand( $1 ) LET $2 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%a%' ), $3 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%o%' ), $1 = intersect( $2, $3 )
I thought I could accomplish the opposite effect if I used the difference() function:
select expand( $1 ) LET $2 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%a%' ), $3 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%o%' ), $1 = difference( $2, $3 )
but it returns zero records, when the sub queries for $2 and $3 run separately return record sets that overlap. What am I failing to understand? I've searched the forums and documentation, but haven't figured it out.
In the end, I want to take vertices found in one result set, and remove from it any vertices found in a second result set. I essentially want the analogous behavior of the SQL EXCEPT operator (https://en.wikipedia.org/wiki/Set_operations_%28SQL%29#EXCEPT_operator).
Any ideas or directions would be extremely helpful!
Regards,
Andrew