OrientDb Vertices with most Mutual Friends - orientdb

I'm trying to find the pairs of vertices that have the greatest number of common vertices between them. It is very similar to the 'number of mutual friends' example used in many graph database demos. I can determine the number of mutual vertices between a pair of known vertices using this:
SELECT Expand($query) LET
$query1 = (SELECT Expand(outE().in) FROM #1:2,
$query2 = (SELECT Expand(OutE().in) FROM #1:3,
$query = Intersect($query1,$query2);
The Count() of the above query's result is the number of common vertices.
However, I can't figure out how to aggregate that query across my entire data set. My best solution has been a brute force, where I iterate through each vertex and run the above query against all other vertices (technically, I do all the vertices 'after' that vertex).
My solution is inefficient and had to be coded in C# rather than done entirely in SQL. How can this be done using OrientDb's SQL?

You can use a SELECT with a MATCH:
SELECT FROM (
SELECT a, b, count(friend) as nFriends from (
MATCH
{class:Person, as:a} -FriendOf- {as:friend} -FriendOf-{as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
)
) ORDER BY nFriends

Slight modification to #Luigi's answer:
SELECT a, b, Count(friend) AS nFriends FROM (
MATCH
{class:Person, as:a} -E- {as:friend} -E- {class:Person, as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
) GROUP BY a, b ORDER BY nFriends DESC
I needed the GROUP BY or I just get one big count.

Related

Cannot get a result by Max date

I'm trying to get the highlighted result only as it's the latest date. First time I've asked a question here so I apologize in advance if this isn't clear. Thanks
By using the following query
SELECT
MAX(A.Insp_Date) AS Last_Insp_Date
,A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
FROM [dbo].[fofHydrntInspHdr] AS A
LEFT OUTER JOIN [dbo].[fofHYD2800FlwTstRT] AS B
ON A.Doc_ID = B.Doc_ID
WHERE A.Doc_ID > 0
AND A.Address_Code = 'GEN0021'
GROUP BY
A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
I've also tried using max doc_id and it still doesn't work. Appreciate any help.
Another option that shouldn't require two scans of your table is to filter for the latest using a window function:
with r as
(
SELECT
A.Insp_Date AS Last_Insp_Date
,A.Doc_ID
,A.Service_Call_ID
,A.Customer_ID
,A.Address_Code
,A.State
,A.Branch
,B.HydLoc
,B.FlwOutSz
,B.StaticPSI
,B.ResidualPSI
,B.PititPSI
,B.FlwGPM
,DENSE_RANK() OVER (ORDER BY A.Insp_Date DESC) AS r
FROM [dbo].[fofHydrntInspHdr] AS A
LEFT OUTER JOIN [dbo].[fofHYD2800FlwTstRT] AS B
ON A.Doc_ID = B.Doc_ID
WHERE A.Doc_ID > 0
AND A.Address_Code = 'GEN0021'
)
SELECT
Insp_Date AS Last_Insp_Date
,Doc_ID
,Service_Call_ID
,Customer_ID
,Address_Code
,State
,Branch
,HydLoc
,FlwOutSz
,StaticPSI
,ResidualPSI
,PititPSI
,FlwGPM
FROM r
WHERE r = 1;
As an aside, I would advise against aliasing your tables with A, B, C etc as they don't relate to the table and make understanding the query later on more awkward. In this case, aliases like h and ft would convey that one table is the Headers and the other the Flow Tests, whilst also reducing character count.
It also looks like you have some bad duplication going on in your results there, which suggests that either your query is not joining and filtering appropriately or your data is messy.

CosmosDB SQL: how to use ARRAY_CONCAT or ARRAY(select distinct ...) as aggregate function?

I have a collection in Azure CosmosDB where each document has 2 fields, say
{"a":1,"b":2}
{"a":1,"b":3}
{"a":2,"b":5}
and I want to return a set of possible values of B for each A, like this:
[{"a":1, "b":[2,3] },
{"a":2, "b":[5] }]
Trying SELECT c.a, ARRAY(SELECT DISTINCT c.b FROM c) FROM c GROUP BY c.a
I receive the error
Property reference 'c.b' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Same for ARRAY(select value c.b from c); and ARRAY_CONCAT does not look like an aggregate function.
Are there ways to do it without aggregating client-side?
Per my knowledge, this is not supported in Cosmos Db so far. Because Array is not aggregate function anyway which means it can't used with group by.
So, i suppose that you need to retrieve the data sort by a. Then loop the result to produce the b array mapping to the duplicate a.
For, Distinct values of b (output is in string format and values of b are comma separated)
SELECT a, String.Join(",", ARRAY_AGG(DISTINCT b)) AS list FROM c;
You can also use below, but the output format is not readable
SELECT a, ARRAY_AGG(DISTINCT b) AS list FROM c;

Esper distinct events on multiple attributes

I have a problem with the stream semantics in Esper. My aim is to output only events with pairwise distinct attributes. Additionally, there are temporal conditions which have to hold between the attributes (see Espers Interval Algebra Reference).
An example statement:
insert into output_stream select a.*, b.*
from stream1#length(100) as a, stream2#length(100) as b
where a.before(b) or a.meets(b) or a.overlaps(b)
Pairwise distinct attributes means, I want to ensure that there are no two outputs o1, o2 where o1.a = o2.a or o1.b = o2.b. To give a more concrete example, if there are the results
o1: (a = a1, b = b1),
o2: (a = a1, b = b2),
o3: (a = a2, b = b2),
o4: (a = a2, b = b1)
only two of them shall be output (e.g. o1 and o3 or o2 and o4). Which one does not matter for now.
I wanted to accomplish the pairwise distinct attributes with a NOT EXISTS clause like this:
NOT EXISTS (
select * from output_stream#length(100) as otherOutput
where a = otherOutput.a or b = otherOutput.b )
which works partly, for successive output the assertion o1.a = o2.a or o1.b = o2.b always holds.
However, when stream1 first delivers multiple "a"s and then stream2 delivers one "b", that matches the conditions to be joined with both "a"s, there are multiple outputs at once. This is not covered by my NOT EXISTS clause, because in the same step multiple outputs with the same "b" occur, and thus they are not yet in the output_stream.
The distinct keyword is not suitable here, since it checks all attributes together and not pairwise. Likewise, a simple group by on all attributes is unsuitable. I would love to have something like "distinct on a and distinct on b" as a criterion, but it does not exist.
I could possibly solve this with nested group bys where I group on each attribute
select first(*) from (select first(*) from output_stream group by a) group by b
but according to one comment has no well-defined semantics in stream processing systems. Thus, Esper does not allow subqueries in the from part of the query.
What I need is a way to force only output one output at a time and thus have the NOT EXISTS condition rechecked on every further output, or somehow check the outputs that occur at the same time against one another, before actually inserting them into the stream.
Update:
Timing of the output is not very critical. The output_stream will be used by other such statements, so I can account for delays by increasing the length of the windows. stream1 and stream2 deliver events in the order of their startTimestamp property.
create schema Pair(a string, b string);
create window PairWindow#length(100) as Pair;
insert into PairWindow select * from Pair;
on PairWindow as arriving select * from PairWindow as other
where arriving.a = other.a or arriving.b = other.b
Here is a sample self-join using a named window that keeps the last 100 pairs.
EDIT: Above query was designed for my understanding of the original requirements. Below query is designed for the new clarifications. It checks whether "a" or "b" had any previous value (in the last 100 events, leave #length(100) off as needed)
create schema Pair(a string, b string);
create window PairUniqueByA#firstunique(a)#length(100) as Pair;
create window PairUniqueByB#firstunique(b)#length(100) as Pair;
insert into PairUniqueByA select * from Pair;
insert into PairUniqueByB select * from Pair;
select * from Pair as pair
where not exists (select a from PairUniqueByA as uba where uba.a = pair.a)
and not exists (select a from PairUniqueByB as ubb where ubb.b = pair.b);

OrientDB SQL Check if multiple pairs of vertices are connected

I haven't been able to find an answer for the SQL for this.
Given pairs of vertices (record ids) and edge types between them, I want to check if all pairs exists.
V1 --E1--> V2
V3 --E2--> V4
... and so on. The answer I want is true / false or something equivalent. ALL connections must be present in order to evaluate to true, so at least one edge (of correct type) must exist for each pair.
Pseudo, the question would be:
Does V1 have edge <E1EdgeType> to V2?
AND
Does V3 have edge <E2EdgeType> to V4?
AND
... and so on
Does anyone know what the orientDB SQL would be to achieve this?
UPDATE
I did already have one way of checking if one single edge exists between known vertices. It's perhaps not very pretty either, but it works:
SELECT FROM (
SELECT EXPAND(out('TestEdge')) FROM #12:0
) WHERE #rid=#12:1
This will return the destination record (#12:0) if an edge of type 'TestEdge' exists from #12:0 to #12:1. However, if I have two of those, how can I query for one single result for both queries. Something like:
SELECT <something with $c> LET
$a = (SELECT FROM (SELECT EXPAND(out('TestEdge')) FROM #12:0) WHERE #rid=#12:1)
$b = (SELECT FROM (SELECT EXPAND(out('AnotherTestEdge')) FROM #12:2) WHERE #rid=#12:3)
$c = <something that checks that both a and b yield results>
That's what I aim towards doing. Please tell me if I'm solving this the wrong way. I'm not even sure what the gain is to merge queries like this compared to just repeat queries.
Given a pair of vertices, say #11:0 and #12:0, the following query will effectively check whether there is an edge of type E from #11:0
to #12:0
select from (select #this, out(E) from #11:0 unwind out) where out = #12:0
----+------+-----+-----
# |#CLASS|this |out
----+------+-----+-----
0 |null |#11:0|#12:0
----+------+-----+-----
This is highly inelegant and I would encourage you to think about formulating an enhancement request accordingly at https://github.com/orientechnologies/orientdb/issues
One way to incorporate the boolean tests you have in mind is illustrated by the following:
select from
(select $a.size() as a, $b.size() as b
let a=(select count(*) as e from (select out(E) from #11:0 unwind out)
where out = #12:0),
b=(select count(*) as e from (select out(E) from #11:1 unwind out)
where out = #12:2))
where a > 0 and b > 0
Yes, inelegance again :-(
It might be useful to you the following query
SELECT eval('sum($a.size(),$b.size())==2') as existing_edges
let $a = ( SELECT from TestEdge where out = #12:0 and in = #12:1 limit 1),
$b = ( SELECT from AnotherTestEdge where out = #12:2 and in = #12:3 limit 1)
Hope it helps.

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.