How to write a query that filters traversed elements - orientdb

I'm struggling with the following query. For a family tree database, I have a vertex 'Person' and a lightweight edge 'Child', so the edge would go out of a parent and into a child (ie 'child-of'). From a person, I need to get their siblings who share the exact same parents.
I can get all of a persons siblings fairly easy, as follows;
SELECT
FROM (
TRAVERSE out_Child
FROM (
SELECT expand(in_Child)
FROM #11:3
)
WHILE $depth <= 1
)
WHERE $depth = 1
So this gets the parents of the person in question, then gets all the children of the parents. The results might look like the following
#rid in_Child
#11:2 #11:0
#11:3 #11:0, #11:1
#11:4 #11:0, #11:1
#11:5 #11:1
I need to filter these results though, as I only want records that have the exact same parents as #11:3. So in this instance, the query should only return #11:3 and #11:4. If the query were for #11:5, it should return #11:5 only. So basically, the in_Child fields must be the same.
I've tried all sorts of queries such as the following, but the query either doesnt run or doesnt filter.
SELECT
FROM (
SELECT
FROM (
TRAVERSE out_Child
FROM (
SELECT expand(in_Child)
FROM #11:3
)
WHILE $depth <= 1
)
WHERE $depth = 1
)
LET $testinChild = (SELECT expand(in_Child) FROM #11:3)
WHERE in_Child CONTAINSALL $testinChild
Ultimately I would prefer to not do any sub-queries, but if it's required then so be it. I Also tried to use traversedElement(0) function, but it only returns the first record traversed (ie #11:0, but not #11:1), so it can't be used.
Update;
If you copy-paste the following into orientdb console (change the password etc to suit your setup), you will have the same dataset described above.
create database remote:localhost/persondb root pass memory graph
alter database custom useLightweightEdges=true
create class Person extends V
create property Person.name string
create class Child extends E
create vertex Person set name = "Father"
create vertex Person set name = "Mother"
create vertex Person set name = "Child of father only"
create edge Child from #11:0 to #11:2
create vertex Person set name = "Child of father+mother #1"
create edge Child from #11:0 to #11:3
create edge Child from #11:1 to #11:3
create vertex Person set name = "Child of father+mother #2"
create edge Child from #11:0 to #11:4
create edge Child from #11:1 to #11:4
create vertex Person set name = "Child of mother only"
create edge Child from #11:1 to #11:5

Okay, I've found some solutions.
First of all, the way I used CONTAINSALL in the question is not correct, as pointed out to me here. CONTAINSALL does not check that all the items on the 'right' are in the 'left', but actually loops over each item in the 'left' and uses that item in the expression on the 'right'. SO WHERE in_Child CONTAINSALL (sex = 'Male) will filter for records where all of the in_Child records are only Male (ie no females). It's basically checking that in_Child[0:n].sex = 'Male'.
So I tried this query;
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
WHERE
(SELECT expand(in('Child')) from #11:3) CONTAINSALL (#rid in $current.in_Child)
I think OrientDB might have a bug here. The above query return #11:2, #11:3 and #11:4, which doesn't make sense to me. I changed this query slightly...
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
$parents CONTAINSALL (#rid in $current.in_Child)
This works better. The above query correctly returns #11:3 and #11:4, but a query on #11:2 or #11:5 also incorrectly includes both #11:3 and #11:4. This makes sense, because it checking the parent rids of eg #11:2 (which is only 1) is in the parents of the rest, which they are. So I added a check to ensure they had the same amount of parents.
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
$parents CONTAINSALL (#rid in $current.in_Child)
AND
$parents.size() = in('Child').size()
Now the query is working correctly for all instances. However, I still wasn't happy with this query. I abandonned the use of CONTAINSALL and eventually came up with the following...
SELECT
FROM (
SELECT
FROM (
TRAVERSE
out('Child')
FROM (
SELECT
expand(in('Child'))
FROM
#11:3
)
WHILE
$depth <= 1
)
WHERE
$depth = 1
)
LET
$parents = (SELECT expand(in('Child')) from #11:3)
WHERE
in_Child.asSet() = $parents.asSet()
This appears the best/safest, and is the one I will use.

UPDATE for dynamic number of parents :
SELECT
distinct(#rid)
FROM
(SELECT
expand(intersect)
FROM
(SELECT
in('Child').out('Child') as intersect
FROM
#17:2))
WHERE
in('Child').size() = $parentCount.size[0]
LET $parentCount = (SELECT
in('Child').size() as size
FROM
#17:2)

Related

How to add a label based on dense rank value [duplicate]

This question already exists:
Cannot when add ORDER BY in a CTE
Closed 9 months ago.
Is there any way I can reference the inner dense rank results and give them appropriate labels like I am trying to do? It seems like in T-SQL I just can NOT do an "Order by" in an inner query, it is producing an error like:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
But then how do I attach appropriate labels to the dense rank results in the code below?
WITH basequery AS (
SELECT c.baseentityid,
c.team,
c.providerid,
-- rank() OVER (PARTITION BY c.baseentityid ORDER BY c.version DESC) AS rn,
t.vaccinedate,
t.vaccine,
t.vaccinesource,
t.num
FROM vr.child_registration c
CROSS APPLY ( VALUES (c.bcgsource,c.bcgdate,'bcg',1), (c.opv0source,c.opv0date,'opv0',2), (c.penta1source,c.penta1date,'penta1',3), (c.pcv1source,c.pcv1date,'pcv1',4), (c.rota1source,c.rota1date,'rota1',5), (c.opv1source,c.opv1date,'opv1',6), (c.penta2source,c.penta2date,'penta2',7), (c.pcv2source,c.pcv2date,'pcv2',8), (c.rota2source,c.rota2date,'rota2',9), (c.opv2source,c.opv2date,'opv2',10), (c.penta3source,c.penta3date,'penta3',11), (c.ipvsource,c.ipvdate,'ipv',12), (c.pcv3source,c.pcv3date,'pcv3',13), (c.opv3source,c.opv3date,'opv3',14), (c.measles1source,c.measles1date,'measles1',15), (c.tcvsource,c.tcvdate,'tcv',16), (c.ipv2source,c.ipv2date,'ipv2',17), (c.measles2source,c.measles2date,'measles2',18)) t(vaccinesource, vaccinedate, vaccine, num)
WHERE t.vaccinedate IS NOT NULL AND t.vaccinedate <> ''
)
SELECT aa.baseentityid,
aa.team,
aa.vaccinedate,
aa.vaccine,
aa.vaccinesource,
aa.num,
aa.providerid,
aa.visitrank,
CASE
WHEN aa.visitrank = 0 THEN 'external vaccination'
WHEN aa.visitrank = 1 THEN 'Enrollment'
ELSE 'Visitation'
END AS visittype,
CASE
WHEN aa.providerid LIKE '%vacc%' THEN 'Vacc'
ELSE 'Non-Vacc'
END AS providertype
FROM ( SELECT a.baseentityid,
a.team,
a.vaccinedate,
a.vaccine,
a.vaccinesource,
a.num,
a.providerid,
CASE
WHEN a.vaccinesource = 'vaccinatoradministered' THEN dense_rank() OVER (PARTITION BY a.baseentityid, (
CASE
WHEN a.vaccinesource = 'vaccinatoradministered' THEN 1
ELSE 0
END) ORDER BY (CONVERT(VARCHAR(10),a.vaccinedate,111)) )
ELSE 0
END AS visitrank
FROM basequery a
WHERE a.rn = 1
GROUP BY a.baseentityid, a.team, a.vaccinedate, a.vaccine, a.vaccinesource, a.num, a.providerid
ORDER BY a.baseentityid, a.num) aa;

How to use a recursive query in a subquery in PostgreSQL

I created a recursive query that returns me a string of the productcategory history (typical parent-child relation:
with recursive productCategoryHierarchy as (
--start with the "anchor" row
select
1 as "level",
pg1.id,
pg1.title,
pg1.parentproductgroup_id
from product_group pg1
where
pg1.id = '17e949b6-85b3-4c87-8f76-ad1e61ea01e1' --parameterize me
union all
-- Get child nodes
select
pch.level +1 as "level",
pg2.id,
pg2.title,
pg2.parentproductgroup_id
from product_group pg2
join productCategoryHierarchy pch on pch.parentproductgroup_id = pg2.id
)
-- Get hierarchy as string
select
CONCAT('',string_agg(productCategoryHierarchy.title, ' > '),'')
from productCategoryHierarchy;
Now I want to use this result in another query as a subquery so that I can use the created string as an attribute in the parent query. Is that possible in Postgres or is there another solution to get a hierarchical tree as string in an attribute?
Are you looking for something like this?
with recursive productcategoryhierarchy as (
...
), aggregated_values as (
select string_agg(productCategoryHierarchy.title, ' > ') as all_titles
from productCategoryHierarchy
)
select ..., (select all_titles from aggregated_values) as all_titles
from ... your main query goes here ..

Selecting parent records by filtering multiple fields of collection of links

I have been trying to figure out this for a couple of days know but I can't come up with a query that gives me the correct results. The essence of the task is that I am trying to retrieve all the nodes of a graph that have children with attributes that satisfy multiple constraints. The issue I have is that a node may have multiple linked nodes and when I try to apply criteria to restrict which nodes must be returned by the query the criteria need to be imposed against sets of nodes instead of individual nodes.
Let me explain the problem in more detail through an example. Here is a sample schema of companies and locations. Each company can have multiple locations.
create class company extends V;
create property company.name STRING;
create class location extends V;
create property location.name STRING;
create property location.type INTEGER;
create property location.inactive STRING;
Let me now create a couple of records to illustrate the problem I have.
create vertex company set name = 'Company1';
create vertex location set name = 'Location1', type = 5;
create vertex location set name = 'Location2', type = 7;
create edge from (select from company where name = 'Company1') to (select from location where name in ['Location1', 'Location2']);
create vertex company set name = 'Company2';
create vertex location set name = 'Location3', type = 6;
create vertex location set name = 'Location4', type = 5, inactive = 'Y';
create edge from (select from company where name = 'Company2') to (select from location where name in ['Location3','Location4']);
I want to retrieve all companies that either don't have a location of type 5 or have a location of type 5 that is inactive (inactive = 'Y'). The query that I tried initially is shown below. It doesn't work because the $loc.type is evaluated against a collection of values instead of a individual record so the is null is not applied against the individual field 'inactive' of each location record but against the collection of values of the field 'inactive' for each parent record. I tried sub-queries, the set function, append and so on but I can't get it to work.
select from company let loc = out() where $loc.type not contains 5 or ($loc.type contains 5 and $loc.type is null)
You can try with this query:
select expand($c)
let $a = ( select expand(out) from E where out.#class = "company" and in.#class="location" and in.type = 5 and in.inactive = "Y" ),
$b = ( select from company where 5 not in out("E").type ),
$c = unionAll($a,$b)
UPDATE
I have created this graph
You can use this query
select expand($f)
let $a = ( select from E where out.#class = "company" and in.#class="location" ),
$b = ( select expand(out) from $a where in.type = 5 and in.inactive = "Y"),
$c = ( select expand(out) from $a where in.type = 5 and in.inactive is null),
$d = difference($b,$c),
$e = ( select from company where 5 not in out("E").type ),
$f = unionAll($d,$e)
Hope it helps.
Try this query:
select expand($d) from company
let $a=(select from company where out().type <> 5 and name contains $parent.current.name),
$b=(select from company where out().type contains 5 and name=$parent.current.name),
$c=(select from company where out().inactive contains "Y" and name=$parent.current.name),
$d=unionall($a,intersect($b,$c))
Hope it helps,
Regards,
Michela

OrientDB Removing one result set from another using the difference() function

We are using version v.1.7-rc2 of OrientDB, embedded in our application, and I'm struggling to figure out a query for removing one set of results from another set of results.
For a simplified example, we have a class of type "A" which is organized in a directional hierarchy. The class has a "name" attribute defined as a string (referring to areas, regions, counties, cities, etc), and a "parent" edge defining a relationship from the child instances to the parent instances.
I was able to find the intersection of the result sets from the two sub-queries of my hierarchy using the instance() function:
select expand( $1 ) LET $2 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%a%' ), $3 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%o%' ), $1 = intersect( $2, $3 )
I thought I could accomplish the opposite effect if I used the difference() function:
select expand( $1 ) LET $2 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%a%' ), $3 = ( select from (traverse in('parent') from (select from A where name = 'Eastern')) where $depth > 0 and name like '%o%' ), $1 = difference( $2, $3 )
but it returns zero records, when the sub queries for $2 and $3 run separately return record sets that overlap. What am I failing to understand? I've searched the forums and documentation, but haven't figured it out.
In the end, I want to take vertices found in one result set, and remove from it any vertices found in a second result set. I essentially want the analogous behavior of the SQL EXCEPT operator (https://en.wikipedia.org/wiki/Set_operations_%28SQL%29#EXCEPT_operator).
Any ideas or directions would be extremely helpful!
Regards,
Andrew

search recursively for dead-ends in topological network table

I've been trying for weeks to figure this out:
I need to recursively search a topological network, OpenStreetMap streets in this case, for dead ends, and neighborhoods that hang from the rest of the network by only one edge. These are places where you might expect to see a no-exit sign if your city is considerate like that.
My table has a record for each edge in the network. Each edge has a 'target' and 'source' field, identifying the node to which that side of the edge is connected. I've added a binary column called 'dangling' to indicate whether the edge has been identified as a dea-ending segment. I initialize this column as FALSE, assuming the best.
So far, I've been able to get to identify simply branching dead-ends with the following SQL
WITH node_counts AS ( -- get all unique nodes
SELECT target AS node FROM edge_table WHERE NOT dangling
UNION ALL
SELECT source AS node FROM edge_table WHERE NOT dangling),
single_nodes AS ( -- select only those that occur once
SELECT node
FROM node_counts
GROUP BY node
HAVING count(*) = 1
) --
UPDATE edge_table SET dangling = true
FROM single_nodes
WHERE node = target OR node = source;
I simply keep running this query until no rows are updated.
The result looks like this(red is dangling = true):
http://i.stack.imgur.com/OE1rZ.png
Excellent! This is working great...but there are still cul-de-sac neighborhoods if you will, which are only connected to the larger network by one edge. How can I identify those?
My best guess is that I'm going to need a WITH RECURSIVE at some point, but that's about as far as my unmathmatical mind will go. Can anyone point me in the right direction?
OK. Here's how I figured it out:
I decided that there was not a way, or least not an easy way to implement this in SQL alone. I ended up implementing Tarjan's Algorithm in PHP and SQL, creating a temporary nodes table which linked each node to a strongly connected subcomponent of the graph. Once that was done, I updated any segment that was touching a node which did not belong to the largest subcomponent, as 'dangling'. All edges therefor that started and ended at nodes belonging to the largest subcomponent belong to the main street network (not dangling).
Here's the code. Note that it can take a very long time to run on a large graph. It's also pretty hard on the working memory, but it worked for my purposes.
<?php
$username = '';
$password = '';
$database = '';
$edge_table = 'cincy_segments';
$v1 = 'target';
$v2 = 'source';
$dangling_boolean_field = 'dangling';
$edge_id_field = 'edge_id';
//global variables declared
$index = 0;
$component_index = 0;
$nodes = array();
$stack = array();
pg_connect("host=localhost dbname=$database user=$username password=$password");
// get vertices
echo "getting data from database\n";
$neighbors_query = pg_query("
WITH nodes AS (
SELECT DISTINCT $v1 AS node FROM $edge_table
UNION
SELECT DISTINCT $v2 AS node FROM $edge_table
),
edges AS (
SELECT
node,
$edge_id_field AS edge
FROM nodes JOIN $edge_table
ON node = $v1 OR node = $v2
)
SELECT
node,
array_agg(CASE WHEN node = $v2 THEN $v1
WHEN node = $v1 THEN $v2
ELSE NULL
END) AS neighbor
FROM edges JOIN $edge_table ON
(node = $v2 AND edge = $edge_id_field) OR
(node = $v1 AND edge = $edge_id_field)
GROUP BY node");
// now make the results into php results
echo "putting the results in an array\n";
while($r = pg_fetch_object($neighbors_query)){ // for each node record
$nodes[$r->node]['id'] = $r->node;
$nodes[$r->node]['neighbors'] = explode(',',trim($r->neighbor,'{}'));
}
// create a temporary table to store results
pg_query("
DROP TABLE IF EXISTS temp_nodes;
CREATE TABLE temp_nodes (node integer, component integer);
");
// the big traversal
echo "traversing graph (this part takes a while)\n";
foreach($nodes as $id => $values){
if(!isset($values['index'])){
tarjan($id, 'no parent');
}
}
// identify dangling edges
echo "identifying dangling edges\n";
pg_query("
UPDATE $edge_table SET $dangling_boolean_field = FALSE;
WITH dcn AS ( -- DisConnected Nodes
-- get nodes that are NOT in the primary component
SELECT node FROM temp_nodes WHERE component != (
-- select the number of the largest component
SELECT component
FROM temp_nodes
GROUP BY component
ORDER BY count(*) DESC
LIMIT 1)
),
edges AS (
SELECT DISTINCT e.$edge_id_field AS disconnected_edge_id
FROM
dcn JOIN $edge_table AS e ON dcn.node = e.$v1 OR dcn.node = e.$v2
)
UPDATE $edge_table SET $dangling_boolean_field = TRUE
FROM edges WHERE $edge_id_field = disconnected_edge_id;
");
// clean up after ourselves
echo "cleaning up\n";
pg_query("DROP TABLE IF EXISTS temp_nodes;");
pg_query("VACUUM ANALYZE;");
// the recursive function definition
//
function tarjan($id, $parent)
{
global $nodes;
global $index;
global $component_index;
global $stack;
// mark and push
$nodes[$id]['index'] = $index;
$nodes[$id]['lowlink'] = $index;
$index++;
array_push($stack, $id);
// go through neighbors
foreach ($nodes[$id]['neighbors'] as $child_id) {
if ( !isset($nodes[$child_id]['index']) ) { // if neighbor not yet visited
// recurse
tarjan($child_id, $id);
// find lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['lowlink']
);
} else if ($child_id != $parent) { // if already visited and not parent
// assess lowpoint
$nodes[$id]['lowlink'] = min(
$nodes[$id]['lowlink'],
$nodes[$child_id]['index']
);
}
}
// was this a root node?
if ($nodes[$id]['lowlink'] == $nodes[$id]['index']) {
do {
$w = array_pop($stack);
$scc[] = $w;
} while($id != $w);
// record results in table
pg_query("
INSERT INTO temp_nodes (node, component)
VALUES (".implode(','.$component_index.'),(',$scc).",$component_index)
");
$component_index++;
}
return NULL;
}
?>
IMO it is not possible without loop-detection. (the dangling bit is a kind of breadcrum-loopdetection). The below query is a forking Y-shape leading into two dead-end-streets (1..4 and 11..14).
If you add the link between #19 back to #15, the recursion will not stop. (Maybe my logic is incorrect or incomplete?)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE edge_table
( source INTEGER NOT NULL
, target INTEGER NOT NULL
, dangling boolean NOT NULL DEFAULT False
);
INSERT INTO edge_table ( source, target) VALUES
(1,2) ,(2,3) ,(3,4)
,(11,12) ,(12,13) ,(13,14)
,( 15,16) ,(16,17) ,(17,18) ,( 18,19)
-- , (19,15) -- this will close the loop
, (19,1) -- Y-fork
, (19,11) -- Y-fork
;
-- EXPLAIN
WITH RECURSIVE cul AS (
SELECT e0.source AS source
, e0.target AS target
FROM edge_table e0
WHERE NOT EXISTS ( -- no way out ...
SELECT * FROM edge_table nx
WHERE nx.source = e0.target
)
UNION ALL
SELECT e1.source AS source
, e1.target AS target
FROM edge_table e1
JOIN cul ON cul.source = e1.target
WHERE 1=1
AND NOT EXISTS ( -- Only one incoming link; no *other* way to cul
SELECT * FROM edge_table nx
WHERE nx.target = cul.source
AND nx.source <> e1.source
)
)
SELECT * FROM cul
;
[ the CTE is of course intended to be used in an update statement to set the dangling fields ]