Opposite of lastUnique() - complex-event-processing

New to Esper and EPL in general, i have two use cases which are basically the opposites of oneanother. First I need to catch all unique events in a timewindow, using firstunique(*parameters*).win:time(*time*).
Now what I need to do is the exact opposite, basically catch all events that arrive in that window and that are NOT thrown by that statement, basically all the duplicates.
How can I achieve this ? Thanks !

You could use a subquery and "not exists". For example:
select * from Event e1 where not exists (select * from Event#firstunique(*parameters*)#time(*time*) e2 where e1.parameters = e2.parameters)

I've actually found a solution, it involves using unique id's for incoming events on top of comparing their parameters.
The query looks something like this :
select * from Event a where exists (select * from Event.std:firstUnique(*parameters*).win:time(*time*) b where a.eventId <> b.eventId)
This solves the problem I had where the exists method would return every event (duplicates and unique events) because the window in the subquery would be filled first.

Related

Spring JPA repository and QueryDsl how to force left join

Let's say I have two entities User and Task, each user can have one task.
The issue that I'm facing is if I have one record in the user table whose email starts with a and there are no records at all in the task table.
This snippet below will return no records although I would expect users that have mail starting with a.
UserRepository in example extends QuerydslPredicateExecutor.
userRepository.findAll(
QUser.user.email.startsWith("a")
.or(QUser.user.task.text.contains("something"))
)
If I check logs, Hibernate is creating cross join with user.task_id=task.id as a part of where clauses. This type of join automatically discards users whose mails are starting with a if they don't have a task assigned.
Is there a way to force usage of left join instead of a cross join in findAll method of the repository?
I know I can do it by using JPAQuery but then I would have to reimplement paging functionality...
JPAQuery query = new JPAQuery(entityManager);
query
.from(QUser.user)
.leftJoin(QTask.task)
// ...
I am not sure if we can do that since the findAll implementation is generated for us. However we can pass a predicate in the findAll method which will help deal with issue you are encountering.
You can try to do something like this:
QUser qUser = QUser.user;
QTask qTask = QTask.task;
JPQL<UserEntity> userJpqlQuery = JPAExpressions.selectFrom(qUser)
.leftjoin(qUser.task, qTask)
.where(qUser.email...., qTask.text...);
userRepository.findAll(qUser.in(userJpqlQuery));
In the code above I have used Querydsl, which is an alternative to CriteriaBuilder and is type safe. Then I have created a subquery to make the selection I want and return the all users matching the subquery.
In the end , hibernate should generate something like this:
select * from User qUser0 where qUser0.id.in(
select qUser1.id from User qUser1
left join Task qTask0 on
qUser1.taskId = qTask0.id
where ...
);

Cumulocity CEP Event Query depending on other event

I have an event that is triggered when a device starts a process with a unique process id.
When the process stops it sends another event with its Timestamp and the same process id.
Now I want to calculate the total process time. So subtract the Timestamp from the Startevent from the Timestamp from the Endevent.
I tried multiple ways to accomplish this but they all failed.
Is it possible to save an item from a query to a variable?
e.g.
select
#var = d.ProcessID
from table d
or is it possible to make subqueries??
e.g.
select
d.TimeStamp
from table d
where d.ProcessID = (select
e.ProcessID
from table e)
Or if anyone has a different suggestion it would be great to have some input :)
Thanks in advance
Greets
You can use patterns to achieve this. Something like that might work:
select * from pattern [every a=StartEvent -> b=StopEvent(sourceId = a.sourceId, processId = a.processId)]
For more info have a look at the Esper doc.

Replace correlated subquery with join

I'd like to replace the following ABAP OpenSQL snippet (in the where clause of a much bigger statement) with an equivalent join.
... AND tf~tarifart = ( SELECT MAX( tf2~tarifart ) FROM ertfnd AS tf2 WHERE tf2~tariftyp = e1~tariftyp AND tf2~bis >= e1~bis AND tf2~ab <= e1~ab ) ...
My motivation: Query migration to ABAP CDS views (basically plain SQL with in comparison somewhat reduced expressiveness). Alas, correlated subqueries and EXISTS statements are not supported.
I googled a bit and found a possible solution (last post) here https://archive.sap.com/discussions/thread/3824523
However, the proposal
Selecting MAX(value)
Your scenarion using inner join to first CDS view
doesn't work in my case.
tf.bis (and tf.ab) need to be in the selection list of the new view to limit the rhs of the join (new view) to the correct time frames.
Alas, there could be multiple (non overlapping) sub time frames (contained within [tf.ab, tf.bis]) with the same tf.tarifart.
Since these couldn't be grouped together, this results in multiple rows on the rhs.
The original query does not have a problem with that (no join -> no Cartesian product).
I hope the following fiddle (working example) clears things up a bit: http://sqlfiddle.com/#!9/8d1f48/3
Given these constraints, to me it seems that an equivalent join is indeed impossible. Suggestions or even confirmations?
select doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis,
max(tar_tarifart)
from
(
select document.belzart as doc_belzart,
document.tariftyp as doc_tariftyp,
document.ab as doc_ab,
document.bis as doc_bis,
tariff.tarifart as tar_tarifart,
tariff.tariftyp as tar_tariftyp,
tariff.ab as tar_ab,
tariff.bis as tar_bis
from dberchz1 as document
inner join ertfnd as tariff
on tariff.tariftyp = document.tariftyp and
tariff.ab <= document.ab and
tariff.bis >= document.bis
) as max_tariff
group by doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis
Translated in English, you seem to want to determine the max applicable tariff for a set of documents.
I'd refactor this into separate steps:
Determine all applicable tariffs, meaning all tariffs that completely cover the document's time interval. This will become your first CDS view, and in my answer forms the sub-query.
Determine for all documents the max applicable tariff. This will form your second CDS view, and in my answer forms the outer query. This one has the MAX / GROUP BY to reduce the result set to one per document.

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

How to avoid T-SQL function being called more times when needing combined results?

I have two T-SQL scalar functions that both perform calculations over large sums of data (taking 'a lot' of time) and return a value, e.g. CalculateAllIncomes(EmployeeID) and CalculateAllExpenditures(EmployeeID).
I run a select statement that calls these and returns results for each Employee. I also need the balance of each employee calculated as AllIncomes-AllExpenditures.
I have a function GetBalance(EmployeeID) that calls the two above mentioned functions and returns the result {CalculateAllIncomes(EmployeeID) - CalculateAllExpenditures(EmployeeID)}. But if I do:
Select CalculateAllIncomes(EmployeeID), CalculateAllExpenditures(EmployeeID), GetBalance(EmployeeID) .... the functions CalcualteAllIncomes() and CalculateAllExpenditures get called twice (once explicitly and once inside the GetBalance funcion) and so the resulting query takes twice as long as it should.
I'd like to find some better solution. I tried:
select alculateAllIncomes(EmployeeID), AS Incomes, CalculateAllExpenditures
(EmployeeID) AS Expenditures, (Incomes - Expenditures) AS Balance....
but it throws errors:
Invalid column name Incomes and
Invalid column name Expenditures.
I'm sure there has to be a simple solution, but I cannot figure it out. For some reason it seems that I am not able to use column Aliases in the SELECT clause. Is it so? And if so, what could be the workaround in this case?
Thanks for any suggestions.
Forget function calls: you can probably do it everything in one normal query.
Function calls misused (trying for OO encapsulation) force you into this situation. In addition, if you have GetBalance(EmployeeID) per row in the Employee table then you are CURSORing over the table. And you've now compounded this by multiple calls too.
What you need is something like this:
;WITH cSUMs AS
(
SELECT
SUM(CASE WHEN type = 'Incomes' THEN SomeValue ELSE 0 END) AS Income),
SUM(CASE WHEN type = 'Expenditures' THEN SomeValue ELSE 0 END) AS Expenditure)
FROM
MyTable
WHERE
EmployeeID = #empID --optional for all employees
GROUP BY
EmployeeID
)
SELECT
Income, Expenditure, Income - Expenditure
FROM
cSUMs
I once got a query down from a weekend to under a second by eliminating this kind of OO thinking from a bog standard set based aggregate query.