Esper distinct events on multiple attributes - group-by

I have a problem with the stream semantics in Esper. My aim is to output only events with pairwise distinct attributes. Additionally, there are temporal conditions which have to hold between the attributes (see Espers Interval Algebra Reference).
An example statement:
insert into output_stream select a.*, b.*
from stream1#length(100) as a, stream2#length(100) as b
where a.before(b) or a.meets(b) or a.overlaps(b)
Pairwise distinct attributes means, I want to ensure that there are no two outputs o1, o2 where o1.a = o2.a or o1.b = o2.b. To give a more concrete example, if there are the results
o1: (a = a1, b = b1),
o2: (a = a1, b = b2),
o3: (a = a2, b = b2),
o4: (a = a2, b = b1)
only two of them shall be output (e.g. o1 and o3 or o2 and o4). Which one does not matter for now.
I wanted to accomplish the pairwise distinct attributes with a NOT EXISTS clause like this:
NOT EXISTS (
select * from output_stream#length(100) as otherOutput
where a = otherOutput.a or b = otherOutput.b )
which works partly, for successive output the assertion o1.a = o2.a or o1.b = o2.b always holds.
However, when stream1 first delivers multiple "a"s and then stream2 delivers one "b", that matches the conditions to be joined with both "a"s, there are multiple outputs at once. This is not covered by my NOT EXISTS clause, because in the same step multiple outputs with the same "b" occur, and thus they are not yet in the output_stream.
The distinct keyword is not suitable here, since it checks all attributes together and not pairwise. Likewise, a simple group by on all attributes is unsuitable. I would love to have something like "distinct on a and distinct on b" as a criterion, but it does not exist.
I could possibly solve this with nested group bys where I group on each attribute
select first(*) from (select first(*) from output_stream group by a) group by b
but according to one comment has no well-defined semantics in stream processing systems. Thus, Esper does not allow subqueries in the from part of the query.
What I need is a way to force only output one output at a time and thus have the NOT EXISTS condition rechecked on every further output, or somehow check the outputs that occur at the same time against one another, before actually inserting them into the stream.
Update:
Timing of the output is not very critical. The output_stream will be used by other such statements, so I can account for delays by increasing the length of the windows. stream1 and stream2 deliver events in the order of their startTimestamp property.

create schema Pair(a string, b string);
create window PairWindow#length(100) as Pair;
insert into PairWindow select * from Pair;
on PairWindow as arriving select * from PairWindow as other
where arriving.a = other.a or arriving.b = other.b
Here is a sample self-join using a named window that keeps the last 100 pairs.
EDIT: Above query was designed for my understanding of the original requirements. Below query is designed for the new clarifications. It checks whether "a" or "b" had any previous value (in the last 100 events, leave #length(100) off as needed)
create schema Pair(a string, b string);
create window PairUniqueByA#firstunique(a)#length(100) as Pair;
create window PairUniqueByB#firstunique(b)#length(100) as Pair;
insert into PairUniqueByA select * from Pair;
insert into PairUniqueByB select * from Pair;
select * from Pair as pair
where not exists (select a from PairUniqueByA as uba where uba.a = pair.a)
and not exists (select a from PairUniqueByB as ubb where ubb.b = pair.b);

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

how to make a logical exclusion within the query?

I have 3 tables in our ERP database holding all delivery data (table documents holds one row for each delivery note, documentpos holds all positions on delivery notes, documentserialnumbers holds all serial numbers for delivered items).
I want to show all items with their serial number that have been delivered to the customer and still resides there so far.
The above shown output of the following query however shows, that one item that has been delivered was returned (red marks) later. The return delivery note has the document number 527419 (dark red mark) and refers to the the delivery note 319821 (green) which is listed yellow.
The correct list would consequential show only items that are still on customer's site without the returned items (see below).
How do I have to change the query in order to exclude the returned items from the output?
The upper table shows in the image shows the output of my query, the table below how it should be.
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
order by a.BelID
You may exclude positions, which are referenced by any "return" delivery note, like this (edited)
select a.BelID, c.ReferenzBelID, a.itemnumber, a.itemname, c.deliverynotenumber,c.documenttype, c.documentmark, b.serialnumber
from dbo.documentpos a
inner join dbo.documentserialnumbers b on a.BelPosID = b.BelPosID
inner join dbo.documents c on a.BelID = c.BelID
inner join sysdba.customers d on d.account = c.A0Name1
where d.AccountID = 'customername' and c.documenttype like '%delivery%'
and not exists (select 1
from dbo.documents cc
where cc.documenttype like '%delivery%'
and c.ReferenzBelID=cc.BelID
and c.documentmark='VLR')
and not exists (select 1
from dbo.documents ccc
join dbo.documentpos aa on aa.BelID = ccc.BelID
where ccc.ReferenzBelID=c.BelID
and ccc.documentmark='VLR'
and a.itemnumber=aa.itemnumber)
order by a.BelID

OrientDb Vertices with most Mutual Friends

I'm trying to find the pairs of vertices that have the greatest number of common vertices between them. It is very similar to the 'number of mutual friends' example used in many graph database demos. I can determine the number of mutual vertices between a pair of known vertices using this:
SELECT Expand($query) LET
$query1 = (SELECT Expand(outE().in) FROM #1:2,
$query2 = (SELECT Expand(OutE().in) FROM #1:3,
$query = Intersect($query1,$query2);
The Count() of the above query's result is the number of common vertices.
However, I can't figure out how to aggregate that query across my entire data set. My best solution has been a brute force, where I iterate through each vertex and run the above query against all other vertices (technically, I do all the vertices 'after' that vertex).
My solution is inefficient and had to be coded in C# rather than done entirely in SQL. How can this be done using OrientDb's SQL?
You can use a SELECT with a MATCH:
SELECT FROM (
SELECT a, b, count(friend) as nFriends from (
MATCH
{class:Person, as:a} -FriendOf- {as:friend} -FriendOf-{as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
)
) ORDER BY nFriends
Slight modification to #Luigi's answer:
SELECT a, b, Count(friend) AS nFriends FROM (
MATCH
{class:Person, as:a} -E- {as:friend} -E- {class:Person, as:b, where:($matched.a != $currentMatch)}
RETURN a, b, friend
) GROUP BY a, b ORDER BY nFriends DESC
I needed the GROUP BY or I just get one big count.

OrientDB SQL Check if multiple pairs of vertices are connected

I haven't been able to find an answer for the SQL for this.
Given pairs of vertices (record ids) and edge types between them, I want to check if all pairs exists.
V1 --E1--> V2
V3 --E2--> V4
... and so on. The answer I want is true / false or something equivalent. ALL connections must be present in order to evaluate to true, so at least one edge (of correct type) must exist for each pair.
Pseudo, the question would be:
Does V1 have edge <E1EdgeType> to V2?
AND
Does V3 have edge <E2EdgeType> to V4?
AND
... and so on
Does anyone know what the orientDB SQL would be to achieve this?
UPDATE
I did already have one way of checking if one single edge exists between known vertices. It's perhaps not very pretty either, but it works:
SELECT FROM (
SELECT EXPAND(out('TestEdge')) FROM #12:0
) WHERE #rid=#12:1
This will return the destination record (#12:0) if an edge of type 'TestEdge' exists from #12:0 to #12:1. However, if I have two of those, how can I query for one single result for both queries. Something like:
SELECT <something with $c> LET
$a = (SELECT FROM (SELECT EXPAND(out('TestEdge')) FROM #12:0) WHERE #rid=#12:1)
$b = (SELECT FROM (SELECT EXPAND(out('AnotherTestEdge')) FROM #12:2) WHERE #rid=#12:3)
$c = <something that checks that both a and b yield results>
That's what I aim towards doing. Please tell me if I'm solving this the wrong way. I'm not even sure what the gain is to merge queries like this compared to just repeat queries.
Given a pair of vertices, say #11:0 and #12:0, the following query will effectively check whether there is an edge of type E from #11:0
to #12:0
select from (select #this, out(E) from #11:0 unwind out) where out = #12:0
----+------+-----+-----
# |#CLASS|this |out
----+------+-----+-----
0 |null |#11:0|#12:0
----+------+-----+-----
This is highly inelegant and I would encourage you to think about formulating an enhancement request accordingly at https://github.com/orientechnologies/orientdb/issues
One way to incorporate the boolean tests you have in mind is illustrated by the following:
select from
(select $a.size() as a, $b.size() as b
let a=(select count(*) as e from (select out(E) from #11:0 unwind out)
where out = #12:0),
b=(select count(*) as e from (select out(E) from #11:1 unwind out)
where out = #12:2))
where a > 0 and b > 0
Yes, inelegance again :-(
It might be useful to you the following query
SELECT eval('sum($a.size(),$b.size())==2') as existing_edges
let $a = ( SELECT from TestEdge where out = #12:0 and in = #12:1 limit 1),
$b = ( SELECT from AnotherTestEdge where out = #12:2 and in = #12:3 limit 1)
Hope it helps.

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.