OrientDB SQL Check if multiple pairs of vertices are connected - orientdb

I haven't been able to find an answer for the SQL for this.
Given pairs of vertices (record ids) and edge types between them, I want to check if all pairs exists.
V1 --E1--> V2
V3 --E2--> V4
... and so on. The answer I want is true / false or something equivalent. ALL connections must be present in order to evaluate to true, so at least one edge (of correct type) must exist for each pair.
Pseudo, the question would be:
Does V1 have edge <E1EdgeType> to V2?
AND
Does V3 have edge <E2EdgeType> to V4?
AND
... and so on
Does anyone know what the orientDB SQL would be to achieve this?
UPDATE
I did already have one way of checking if one single edge exists between known vertices. It's perhaps not very pretty either, but it works:
SELECT FROM (
SELECT EXPAND(out('TestEdge')) FROM #12:0
) WHERE #rid=#12:1
This will return the destination record (#12:0) if an edge of type 'TestEdge' exists from #12:0 to #12:1. However, if I have two of those, how can I query for one single result for both queries. Something like:
SELECT <something with $c> LET
$a = (SELECT FROM (SELECT EXPAND(out('TestEdge')) FROM #12:0) WHERE #rid=#12:1)
$b = (SELECT FROM (SELECT EXPAND(out('AnotherTestEdge')) FROM #12:2) WHERE #rid=#12:3)
$c = <something that checks that both a and b yield results>
That's what I aim towards doing. Please tell me if I'm solving this the wrong way. I'm not even sure what the gain is to merge queries like this compared to just repeat queries.

Given a pair of vertices, say #11:0 and #12:0, the following query will effectively check whether there is an edge of type E from #11:0
to #12:0
select from (select #this, out(E) from #11:0 unwind out) where out = #12:0
----+------+-----+-----
# |#CLASS|this |out
----+------+-----+-----
0 |null |#11:0|#12:0
----+------+-----+-----
This is highly inelegant and I would encourage you to think about formulating an enhancement request accordingly at https://github.com/orientechnologies/orientdb/issues
One way to incorporate the boolean tests you have in mind is illustrated by the following:
select from
(select $a.size() as a, $b.size() as b
let a=(select count(*) as e from (select out(E) from #11:0 unwind out)
where out = #12:0),
b=(select count(*) as e from (select out(E) from #11:1 unwind out)
where out = #12:2))
where a > 0 and b > 0
Yes, inelegance again :-(

It might be useful to you the following query
SELECT eval('sum($a.size(),$b.size())==2') as existing_edges
let $a = ( SELECT from TestEdge where out = #12:0 and in = #12:1 limit 1),
$b = ( SELECT from AnotherTestEdge where out = #12:2 and in = #12:3 limit 1)
Hope it helps.

Related

Cannot get a result by Max date

I'm trying to get the highlighted result only as it's the latest date. First time I've asked a question here so I apologize in advance if this isn't clear. Thanks
By using the following query
SELECT
MAX(A.Insp_Date) AS Last_Insp_Date
,A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
FROM [dbo].[fofHydrntInspHdr] AS A
LEFT OUTER JOIN [dbo].[fofHYD2800FlwTstRT] AS B
ON A.Doc_ID = B.Doc_ID
WHERE A.Doc_ID > 0
AND A.Address_Code = 'GEN0021'
GROUP BY
A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
I've also tried using max doc_id and it still doesn't work. Appreciate any help.
Another option that shouldn't require two scans of your table is to filter for the latest using a window function:
with r as
(
SELECT
A.Insp_Date AS Last_Insp_Date
,A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
,DENSE_RANK() OVER (ORDER BY A.Insp_Date DESC) AS r
FROM [dbo].[fofHydrntInspHdr] AS A
LEFT OUTER JOIN [dbo].[fofHYD2800FlwTstRT] AS B
ON A.Doc_ID = B.Doc_ID
WHERE A.Doc_ID > 0
AND A.Address_Code = 'GEN0021'
)
SELECT
Insp_Date AS Last_Insp_Date
,Doc_ID
,Service_Call_ID
,Customer_ID
,Address_Code
,State
,Branch
,HydLoc
,FlwOutSz
,StaticPSI
,ResidualPSI
,PititPSI
,FlwGPM
FROM r
WHERE r = 1;
As an aside, I would advise against aliasing your tables with A, B, C etc as they don't relate to the table and make understanding the query later on more awkward. In this case, aliases like h and ft would convey that one table is the Headers and the other the Flow Tests, whilst also reducing character count.
It also looks like you have some bad duplication going on in your results there, which suggests that either your query is not joining and filtering appropriately or your data is messy.

EXISTS in filter returning too many values

I need to write a query that uses EXISTS, rather than IN, so that it will run fast. The filter is being fed so many parameter values that EXISTS seems like the only option. The difference is between a 20+ minute query and a 5 second query.
This is the query I have:
SELECT DISTINCT d.GROUP_NAME
FROM [EMPLOYEE] e JOIN [DATA_FACT] d ON (e.KEY = d.KEY)
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select '1234567' -- #ID
)
AND e.Location IN (#Location)
ORDER BY d.GROUP_NAME ASC
The problem is that it is returning too many records. Based on the values I'm passing to filter on, I should get 1 row back but instead I am getting 28.
If I remove the EXISTS and add the following then I get the 1 record I need:
AND e.ID IN ('1234567')
Is there a way to fix the query to work with EXISTS so that I get the correct results?
This is essentially what you want if you are going to try to use exists to filter your data_fact table by parameters in your employee table. Not sure how much it's going to improve your performance though when you throw a massive number of employee IDs at it.
SELECT
d.GROUP_NAME
FROM [DATA_FACT] AS d
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select 1
from EMPLOYEE AS e
WHERE d.[KEY] = e.[KEY]
AND e.[Location] IN (#Location)
AND e.ID IN ('1234567')
)
ORDER BY d.GROUP_NAME ASC

OrientDb Vertices with most Mutual Friends

I'm trying to find the pairs of vertices that have the greatest number of common vertices between them. It is very similar to the 'number of mutual friends' example used in many graph database demos. I can determine the number of mutual vertices between a pair of known vertices using this:
SELECT Expand($query) LET
$query1 = (SELECT Expand(outE().in) FROM #1:2,
$query2 = (SELECT Expand(OutE().in) FROM #1:3,
$query = Intersect($query1,$query2);
The Count() of the above query's result is the number of common vertices.
However, I can't figure out how to aggregate that query across my entire data set. My best solution has been a brute force, where I iterate through each vertex and run the above query against all other vertices (technically, I do all the vertices 'after' that vertex).
My solution is inefficient and had to be coded in C# rather than done entirely in SQL. How can this be done using OrientDb's SQL?
You can use a SELECT with a MATCH:
SELECT FROM (
SELECT a, b, count(friend) as nFriends from (
MATCH
{class:Person, as:a} -FriendOf- {as:friend} -FriendOf-{as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
)
) ORDER BY nFriends
Slight modification to #Luigi's answer:
SELECT a, b, Count(friend) AS nFriends FROM (
MATCH
{class:Person, as:a} -E- {as:friend} -E- {class:Person, as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
) GROUP BY a, b ORDER BY nFriends DESC
I needed the GROUP BY or I just get one big count.

PostgreSQL array_agg order for window functions

The answer to my question was almost here: PostgreSQL array_agg order
Except that I wanted to array_agg over a window function:
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name)
over (partition by ca.min_levels_of_separation),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;
So, maybe this will work in some future version, but I get this error
ERROR: aggregate ORDER BY is not implemented for window functions
LINE 2: select distinct c.concept_name, array_agg(c2.vocabulary_id||...
^
I should probably be selecting from a subquery like the first answer to the quoted question above suggests. I was hoping for something as simple as the order by (in that question's second answer). Or maybe I'm just being lazy about the query and should be doing a group by instead of select distinct.
I did try putting the order by in the windowing function (over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name)), but I get these sort of repeated rows that way:
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)","SNOMED:Diabetes mellitus"}";1
(btw: http://www.ohdsi.org/ if you happen to be curious about where I got the medical vocabulary tables)
Yes, it does look like I was being muddle-headed and didn't need the window function. This seems to work:
select c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where c.concept_code = '44054006'
group by c.concept_name, ca.min_levels_of_separation
order by min_levels_of_separation
I won't accept my answer for a while since it just avoids the question instead of actually answering it, and someone might have something more useful to say on the matter.
like this :
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name ) over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.