Hierarchical rule triggering with Esper - complex-event-processing

We have a stream of events each having the following properties:
public class Event {
private String id;
private String src;
private String dst;
}
Besides, we have a set of hierarchical or nested rules we want to model with EPL and Esper. Each rule should be applied if and only if all of its parent rules have been already activated (a matching instance occurred for all of them). For example:
2 events or more with the same src and dst in 10 seconds
+ 5 or more with src, dst the same as the src, dst in the above rule in 20s
+ 100 or more with src, dst the same as the src, dst in the above rules in 30s
We want to retrieve all event instances corresponding to each level of this rule hierarchy.
For example, considering following events:
id ---- source -------------- destination ---------------- arrival time (second)
1 192.168.1.1 192.168.1.2 1
2 192.168.1.1 192.168.1.2 2
3 192.168.1.1 192.168.1.3 3
4 192.168.1.1 192.168.1.2 4
5 192.168.1.5 192.168.1.8 5
6 192.168.1.1 192.168.1.2 6
7 192.168.1.1 192.168.1.2 7
8 192.168.1.1 192.168.1.2 8
.....
100 other events from 192.168.1.1 to 192.168.1.2 in less than 20 seconds
We want our rule hierarchy to report this instance together with the id of all events corresponding to each level of the hierarchy. For example, something like the following report is required:
2 or more events with src 1928.168.1.1 and dst 192.168.1.2 in 10 seconds ( Ids:1,2 )
+ 5 or more with the same src (192.168.1.1) and dst (192.168.1.2) in 20s (Ids:1,2,4,6,7)
+ 100 or more events from 192.168.1.1 to 192.168.1.2 in 30s (Ids:1,2,4,6,7,8,...)
How can we achieve this (retrieve the ids of the events matched with all rules) in Esper EPL?

Complex use case, it will take some time to model this. I'd start simple with keeping a named window and using some match-recognize or EPL patterns. For the rule nesting, I'd propose triggering other statements using insert-into. A context partition that gets initiated by a triggering event may also come in handy. For the events that are shared between rules, if any, go against the named window using a join or subquery, for example. For the events that arrive after a triggering event of the first or second rule, just use EPL statements that consume the triggering event. Start simple and build it up, become familiar with insert-into and declaring an overlapping context.

You could use each rule as the input for the next rule in the hierarchy, for example, the rule listens for matching events in the last 10 secs ans inserts the results with the 'insert into' clause to a new stream, so the next rule is triggered for events in the new steam and so on... it it a pretty simple use case and can be done even without context partitions.

Related

Spring batch, compare current processed record with the rest of chunk records

I need to plan a solution to this case. I have a Table like this and I have to reduce the number of registers that share Product+Service+Origin to minium range dates possible:
ID
PRODUCT
SERVICE
ORIGIN
STARTDATE
ENDDATE
1
100001
500
1
10/01/2023
15/01/2023
2
100001
500
1
12/01/2023
18/01/2023
I have to read all records, and in the process check date intervals to unificate them:
RecordA (10/01/2023 - 15/01/2023) RecordB (12/01/2023 - 18/01/2023) this will result in update the register with ID1 dates leaving the big range between the two dates and registers: 10/01/2023 - 18/01/2023 (extending to "right" or "left" one of the ranges when necessary)
Other case:
ID
PRODUCT
SERVICE
ORIGIN
STARTDATE
ENDDATE
1
100001
500
1
10/01/2023
15/01/2023
2
100001
500
1
12/01/2023
14/01/2023
On this case, range of dates from Record1 is biggest, We Should delete Record2.
Of course, deleting duplicate date ranges
Now whe have implemented and chunk step to make this possible:
Reader: Read data ordering by common fields (Product-Service-Origin)
Processor: Saves in a HashMap<String, List> in the job context all the register while the combination "Product+Service+Origin" is the same. When detect a new combination, get The current List and make a lot of comparision between this, marking records aux properties to "delete" or "update" and sending the full list to the writer (previusly starting a create other list in the map with the new combination of common fields)
Writer: group the records to delete and update and call child writers to execute the sentence.
Well, this was the software several years ago but soon We have to control massive records for each case and the "solution" of use an map in the JobContext have to change.
I was thinking if Spring Batch has some features for process this type of situations that I can use.
Anyway I am thinking about change the step where We insert all this records, and make date range checks one-to-one in the processor, but I think the commit interval here will be mandatory 1 to allows each register check all the previous processed registers (table is iintially empty when We execute this job). Other value in commit interval will check in bbdd but not in the previous processed items making incorrect processing here.
All this cases can have 0-n records sharing Product+Service+Origin.
Sorry my english, it's difficult explain this on other language.

Selecting multiple values/aggregators from Influxdb, excluding time

I got a influx db table consisting of
> SELECT * FROM results
name: results
time artnum duration
---- ------ --------
1539084104865933709 1234 34
1539084151822395648 1234 81
1539084449707598963 2345 56
1539084449707598123 2345 52
and other tags. Both artnum and duration are fields (that is changeable though). I'm now trying to create a query (to use in grafana) that gives me the following result with a calculated mean() and the number of measurements for that artnum:
artnum mean_duration no. measurements
------ -------- -----
1234 58 2
2345 54 2
First of all: Is it possible to exclude the time column? Secondly, what is the influx db way to create such a table? I started with
SELECT mean("duration"), "artnum" FROM "results"
resulting in ERR: mixing aggregate and non-aggregate queries is not supported. Then I found https://docs.influxdata.com/influxdb/v1.6/guides/downsampling_and_retention/, which looked like what I wanted to do. I then created a infinite retention policy (duration 0s) and a continuous query
> CREATE CONTINUOUS QUERY "cq" ON "test" BEGIN
SELECT mean("duration"),"artnum"
INTO infinite.mean_duration
FROM infinite.test
GROUP BY time(1m)
END
I followed the instructions, but after I fed some data to the db and waited for 1m, `SELECT * FROM "infinite"."mean_duration" did not return anything.
Is that approach the right one or should I continue somewhere else? The very goal is to see the updated table in grafana, refreshing once a minute.
InfluxDB is a time series database, so you really need the time dimension - also in the response. You will have a hard time with Grafana if your query returns non time series data. So don't try to remove time from the query. Better option is to hide time in the Grafana table panel - use column styles and set Type: Hidden.
InfluxDB doesn't have a tables, but measurements. I guess you need query with proper grouping only, no advance continous queries, etc.. Try and improve this query*:
SELECT
MEAN("duration"),
COUNT("duration")
FROM results
GROUP BY "artnum" fill(null)
*you may have a problem with grouping in your case, because artnum is InfluxDB field - better option is to save artnum as InfluxDB tag.

Kafka aggregate single log event lines to a combined log event

I'm using Kafka to process log events. I have basic knowledge of Kafka Connect and Kafka Streams for simple connectors and stream transformations.
Now I have a log file with the following structure:
timestamp event_id event
A log event has multiple log lines which are connected by the event_id (for example a mail log)
Example:
1234 1 START
1235 1 INFO1
1236 1 INFO2
1237 1 END
And in general there are multiple events:
Example:
1234 1 START
1234 2 START
1235 1 INFO1
1236 1 INFO2
1236 2 INFO3
1237 1 END
1237 2 END
The time window (between START and END) could be up to 5 minutes.
As result I want a topic like
event_id combined_log
Example:
1 START,INFO1,INFO2,END
2 START,INFO2,END
What are the right tools to achieve this? I tried to solve it with Kafka Streams but I can figure out how..
In your use case you are essentially reconstructing sessions or transactions based on message payloads. At the moment there is no built-in, ready-to-use support for such functionality. However, you can use the Processor API part of Kafka's Streams API to implement this functionality yourself. You can write custom processors that use a state store to track when, for a given key, a session/transaction is STARTed, added to, and ENDed.
Some users in the mailing lists have been doing that IIRC, though I am not aware of an existing code example that I could point you to.
What you need to watch out for is to properly handle out-of-order data. In your example above you listed all input data in proper order:
1234 1 START
1234 2 START
1235 1 INFO1
1236 1 INFO2
1236 2 INFO3
1237 1 END
1237 2 END
In practice though, messages/records may arrive out-of-order, like so (I only show messages with key 1 to simplify the example):
1234 1 START
1237 1 END
1236 1 INFO2
1235 1 INFO1
Even if that happens, I understand that in your use case you still want to interpret this data as: START -> INFO1 -> INFO2 -> END rather than START -> END (ignoring/dropping INFO1 and INFO2 = data loss) or START -> END -> INFO2 -> INFO1 (incorrect order, probably also violating your semantic constraints).

creating a trigger for snmp item in zabbix

I have a problem in defining a trigger for a SNMP item in zabbix. The SNMP OID is 'IF-MIB::ifHCInOctets.10001' & the key is 'inbound_traffic10001' for the item.
The item stores Delta (simple change) & update interval is set 20 120 seconds.
I have defined a trigger with the following expression:
{Template SNMP-v2 Switch C3750:inbound_traffic10001.last()}>2000000
I want the trigger fires if the inbound traffic of the port 1 of the switch goes over 2MBs.
But the problem is that the traffic goes over 2MB but the trigger does not fire!!!
Any help is appreciated.
As we discovered through the discussion, the "Store value" setting in the item should be set to Delta (Speed per second).

SQL Server 2008 Provide Value Once for Multiple Instances in Select statement

Thanks in advance for your help.
I'm working with an existing query that is called CustSalesbyRoute. I've made a recent change to it to get a distinct count of Route Stops by Customer and this information is stored in a separate table I created for CustRouteStops. However, the query also includes Product Family so this duplicates my distinct Route-Stop count. I've confirmed that several reports use Product Family from this table to pull various data, otherwise, I would eliminate this column and everything would work great. I also know that this essentially goes against relational database integrity; however, I'm left to work on it, and I'm hoping to have a solution that is equivalent to Excel's FREQUENCY/MATCH functions or SUMPRODUCT, etc.
The question is this: Is there a way to display the Customer Route-Stops value for the first instance of rows that have multiple Product Family values and show 0 for any others for the same Customer/Route combination? For example, below is what the query is doing now if I include Product Family:
CustID ProdFamily Route #ofStops
10001 Dairy 101 14
10001 Dry Groceries 101 14
10001 Meat 101 14
This is what I would like to see happen because I need to leave Product Family in the table:
CustID ProdFamily Route #ofStops
10001 Dairy 101 14 <<< 1st instance will provide the Stops value
10001 Dry Groceries 101 0 <<< remaining rows for same Cust# and Route will show 0
10001 Meat 101 0
Thanks again!