Esper EPL window select not working for a basic example - complex-event-processing

Everything I read says this should work: I need my listener to trigger every 10 seconds with events. What I am getting now is every event in, it a listener trigger. What am I missing? The basic requirements are to create summarized statistics every 10s. Ideally I just want to pump data into the runtime. So, in this example, I would expect a dump of 10 records, once every 10 seconds
class StreamTest {
private final Configuration configuration = new Configuration();
private final EPRuntime runtime;
private final CompilerArguments args = new CompilerArguments();
private final EPCompiler compiler;
public DatadogApplicationTests() {
configuration.getCommon().addEventType(CommonLogEntry.class);
runtime = EPRuntimeProvider.getRuntime(this.getClass().getSimpleName(), configuration);
args.getPath().add(runtime.getRuntimePath());
compiler = EPCompilerProvider.getCompiler();
}
#Test
void testDisplayStatsEvery10S() throws Exception{
// Display stats every 10s about the traffic during those 10s:
EPCompiled compiled = compiler.compile("select * from CommonLogEntry.win:time(10)", args);
runtime.getDeploymentService().deploy(compiled).getStatements()[0].addListener(
(old, newEvents, epStatement, epRuntime) ->
Arrays.stream(old).forEach(e -> System.out.format("%s: received %n", LocalTime.now()))
);
new BufferedReader(new InputStreamReader(this.getClass().getResourceAsStream("/access.log"))).lines().map(CommonLogEntry::new).forEachOrdered(e -> {
runtime.getEventService().sendEventBean(e, e.getClass().getSimpleName());
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ex) {
System.err.println(ex);
}
});
}
}
Which currently outputs every second, corresponding to the sleep in my stream:
11:00:54.676: received
11:00:55.684: received
11:00:56.689: received
11:00:57.694: received
11:00:58.698: received
11:00:59.700: received

A time window is a sliding window. There is a chapter on basic concepts that explains how they work. Here is the link to the basic concepts chapter.
It is not clear what the requirements are but I think what you want to achieve is collecting events for a while and then releasing them. You can draw inspiration from the solution patterns.
This will collect events for 10 seconds.
create schema StockTick(symbol string, price double);
create context CtxBatch start #now end after 10 seconds;
context CtxBatch select * from StockTick#keepall output snapshot when terminated;

Related

Mongo change-Stream with Spring resumeAt vs startAfter and fault tolerance in case of connection loss

Can't find an answer on stackOverflow, nor in any documentation,
I have the following change stream code(listen to a DB not a specific collection)
Mongo Version is 4.2
#Configuration
public class DatabaseChangeStreamListener {
//Constructor, fields etc...
#PostConstruct
public void initialize() {
MessageListenerContainer container = new DefaultMessageListenerContainer(mongoTemplate, new SimpleAsyncTaskExecutor(), this::onException);
ChangeStreamRequest.ChangeStreamRequestOptions options =
new ChangeStreamRequest.ChangeStreamRequestOptions(mongoTemplate.getDb().getName(), null, buildChangeStreamOptions());
container.register(new ChangeStreamRequest<>(this::onDatabaseChangedEvent, options), Document.class);
container.start();
}
private ChangeStreamOptions buildChangeStreamOptions() {
return ChangeStreamOptions.builder()
.returnFullDocumentOnUpdate()
.filter(newAggregation(match(where(OPERATION_TYPE).in(INSERT.getValue(), UPDATE.getValue(), REPLACE.getValue(), DELETE.getValue()))))
.resumeAt(Instant.now().minusSeconds(1))
.build();
}
//more code
}
I want the stream to start listening from system initiation time only, without taking anything prior in the op-log, will .resumeAt(Instant.now().minusSeconds(1)) work?
do I need to use starAfter method if so how can I found the latest resumeToken in the db?
or is it ready out of the box and I don't need to add any resume/start lines?
second question, I never stop the container(it should always live while app is running), In case of disconnection from the mongoDB and reconnection will the listener in current configuration continue to consume messages? (I am having a hard time simulation DB disconnection)
If it will not resume handling events, what do I need to change in the configuration so that the change stream will continue and will take all the event from the last received resumeToken prior to the disconnection?
I have read this great article on medium change stream in prodcution,
but it uses the cursor directly, and I want to use the spring DefaultMessageListenerContainer, as it is much more elegant.
So I will answer my own(some more dumb, some less :)...) questions:
when no resumeAt timestamp provided the change stream will start from current time, and will not draw any previous events.
resumeAfter event vs timestamp difference can be found here: stackOverflow answer
but keep in mind, that for timestamp it is inclusive of the event, so if you want to start from next event(in java) do:
private BsonTimestamp getNextEventTimestamp(BsonTimestamp timestamp) {
return new BsonTimestamp(timestamp.getValue() + 1);
}
In case of internet disconnection the change stream will not resume,
as such I recommend to take following approach in case of error:
private void onException() {
ScheduledExecutorService executorService = newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(() -> recreateChangeStream(executorService), 0, 1, TimeUnit.SECONDS);
}
private void recreateChangeStream(ScheduledExecutorService executorService) {
try {
mongoTemplate.getDb().runCommand(new BasicDBObject("ping", "1"));
container.stop();
startNewContainer();
executorService.shutdown();
} catch (Exception ignored) {
}
}
First I am creating a runnable scheduled task that always runs(but only 1 at a time newSingleThreadScheduledExecutor()), I am trying to ping the DB, after a successful ping I am stopping the old container and starting a new one, you can also pass the last timestamp you took so that you can get all events you might have missed
timestamp retrieval from event:
BsonTimestamp resumeAtTimestamp = changeStreamDocument.getClusterTime();
then I am shutting down the task.
also make sure the resumeAtTimestamp exist in oplog...

Debounce kafka events

I am planning on setting up a MySQL to Kafka flow, with the end goal being to schedule a process to recalculate a mongoDB document based on the changed data.
This might involve directly patching the mongoDB documents, or running a process that will recreate an entire document.
My question is this, if a set of changes to the MySQL database are all related to one mongoDB document, then I don't want to re-run the recalculate process for each change in real time, I want to wait for the changes to 'settle' so that I only run the recalculate process as needed.
Is there a way to 'debounce' the Kafka stream? E.g. is there a well defined pattern for a Kafka consumer that I can use to implement the logic I want?
At present there's no easy way to debounce events.
The problem, in short, is that Kafka doesn't act based on 'wall clock time'. Processing is generally triggered by incoming events (and the data contained therein), not by arbitrary triggers, like system time.
I'll cover why Suppressed and SessionWindows don't work, the proposed solution in KIP-242, and an untested workaround.
Suppressed
Suppressed has a untilTimeLimit() function, but it isn't suitable for debouncing.
If another record for the same key arrives in the mean time, it replaces the first record in the buffer but does not re-start the timer.
SessionWindows
I thought that using SessionWindows.ofInactivityGapAndGrace() might work.
First I grouped, windowed, aggregated, and suppressed the input KStream:
val windowedData: KTable<Windowed<Key>, Value> =
inputTopicKStream
.groupByKey()
.windowedBy(
SessionWindows.ofInactivityGapAndGrace(
WINDOW_INACTIVITY_DURATION,
WINDOW_INACTIVITY_DURATION,
)
)
.aggregate(...)
.suppress(
Suppressed.untilWindowCloses(
Suppressed.BufferConfig.unbounded()
)
)
Then I aggregated the windows, so I could have a final state
windowedData
.groupBy(...)
.reduce(
/* adder */
{ a, b -> a + b },
/* subtractor */
{ a, a -> a - a },
)
However the problem is that SessionWindows will not close without additional records coming up. Kafka will not independently close the window - it requires additional records to realise that the window can be closed, and that suppress() can forward a new record.
This is noted in Confluent's blog https://www.confluent.io/de-de/blog/kafka-streams-take-on-watermarks-and-triggers/
[I]f you stop getting new records wall-clock time will continue to advance, but stream time will freeze. Wall-clock time advances because that little quartz watch in your computer keeps ticking away, but stream time only advances when you get new records. With no new records, stream time is frozen.
KIP-424
KIP-424 proposed an improvement that would allow Suppress to act as a debouncer, but there's been no progress in a couple of years.
Workaround
Andrey Bratus provided a simple workaround in the JIRA ticket for KIP-424, KAFKA-7748. I tried it but it didn't compile - I think the Kafka API has evolved since the workaround was posted. I've updated the code, but I haven't tested it.
import java.time.Duration;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.processor.PunctuationType;
import org.apache.kafka.streams.processor.api.Processor;
import org.apache.kafka.streams.processor.api.ProcessorContext;
import org.apache.kafka.streams.processor.api.Record;
import org.apache.kafka.streams.state.TimestampedKeyValueStore;
import org.apache.kafka.streams.state.ValueAndTimestamp;
/**
* THIS PROCESSOR IS UNTESTED
* <br>
* This processor mirrors the source, but waits for an inactivity gap before forwarding records.
* <br>
* The suppression is key based. Newer values will replace previous values, and reset the inactivity
* gap.
*/
public class SuppressProcessor<K, V> implements Processor<K, V, K, V> {
private final String storeName;
private final Duration debounceCheckInterval;
private final long suppressTimeoutMillis;
private TimestampedKeyValueStore<K, V> stateStore;
private ProcessorContext<K, V> context;
/**
* #param storeName The name of the {#link TimestampedKeyValueStore} which will hold
* records while they are being debounced.
* #param suppressTimeout The duration of inactivity before records will be forwarded.
* #param debounceCheckInterval How regularly all records will be checked to see if they are
* eligible to be forwarded. The interval should be shorter than
* {#code suppressTimeout}.
*/
public SuppressProcessor(
String storeName,
Duration suppressTimeout,
Duration debounceCheckInterval
) {
this.storeName = storeName;
this.suppressTimeoutMillis = suppressTimeout.toMillis();
this.debounceCheckInterval = debounceCheckInterval;
}
#Override
public void init(ProcessorContext<K, V> context) {
this.context = context;
stateStore = context.getStateStore(storeName);
context.schedule(debounceCheckInterval, PunctuationType.WALL_CLOCK_TIME, this::punctuate);
}
#Override
public void process(Record<K, V> record) {
final var key = record.key();
final var value = record.value();
final var storedRecord = stateStore.get(key);
final var isNewRecord = storedRecord == null;
final var timestamp = isNewRecord ? System.currentTimeMillis() : storedRecord.timestamp();
stateStore.put(key, ValueAndTimestamp.make(value, timestamp));
}
private void punctuate(long timestamp) {
try (var iterator = stateStore.all()) {
while (iterator.hasNext()) {
KeyValue<K, ValueAndTimestamp<V>> storedRecord = iterator.next();
if (timestamp - storedRecord.value.timestamp() > suppressTimeoutMillis) {
final var record = new Record<>(
storedRecord.key,
storedRecord.value.value(),
storedRecord.value.timestamp()
);
context.forward(record);
stateStore.delete(storedRecord.key);
}
}
}
}
}
If you are using a Kafka Streams app, you could try to use suppress
It is designed for WindowedKStream and KTable to "hold back" an update and very useful for rate limiting or notification at the end of a window.
There is a quite useful explanation on https://www.confluent.de/blog/kafka-streams-take-on-watermarks-and-triggers/

Send message to Kafka when SessionWindows was started and ended

I want to send a message to the Kafka topic when new SessionWindow was created and when was ended. I have the following code
stream
.filter(user -> user.isAdmin)
.keyBy(user -> user.username)
.window(ProcessingTimeSessionWindows.withGap(Time.seconds(10)))
//what now? Trigger?
Now I want to send message when new session was started (with some metadata like web browser and timestamps, these informations are available in each element of stream) and send message to Kafka when session was ended (in this example 10 seconds after last element I think) with number of total requests.
It's possible in Flink? I think I should use some trigger but I don't know how and I can't find any example.
If You want to do this when the window is processed, then You can simply use the WindowProcessFunction, basically what You need to do is to add .process(new MyProcessFunction() to Your code. In the ProcessFunction You can have access to the whole window including its first (start) and last (end) element. You can simply use the Side output to just output the beginning and the end of the given window. You can then create a stream from side output and sink it to Kafka. More on Side outputs can be found here.
You can write a custom window trigger.
How to tell a new session is started?
You can create a ValueState with the default value to null, so in case the state value is null, it is a session start.
When the session ended?
Just before TriggerResult.FIRE.
Here is a demo based on ProcessingTimeTrigger of Flink, I only put the question-related logics here, you can check other details from the source code.
public class MyProcessingTimeTrigger extends Trigger<Object, TimeWindow> {
// a state which keeps a session start.
private final ValueStateDescriptor<Long> stateDescriptor = new ValueStateDescriptor<Long>("session-start", Long.class);
#Override
public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
ValueState<Long> state = ctx.getPartitionedState(stateDescriptor);
if(state.value() == null) {
// if value is null, it's a session start.
state.update(window.getStart());
}
ctx.registerProcessingTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
#Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) {
// here is a session end.
return TriggerResult.FIRE;
}
#Override
public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
ctx.getPartitionedState(stateDescriptor).clear();
ctx.deleteProcessingTimeTimer(window.maxTimestamp());
}
}

Kafka listener, get all messages

Good day collegues.
I have Kafka project using Spring Kafka what listen a definite topic.
I need one time in a day listen all messages, put them into a collection and find specific message there.
I couldn't understand how to read all messages in one #KafkaListener method.
My class is:
#Component
public class KafkaIntervalListener {
public CountDownLatch intervalLatch = new CountDownLatch(1);
private final SCDFRunnerService scdfRunnerService;
public KafkaIntervalListener(SCDFRunnerService scdfRunnerService) {
this.scdfRunnerService = scdfRunnerService;
}
#KafkaListener(topics = "${kafka.interval-topic}", containerFactory = "intervalEventKafkaListenerContainerFactory")
public void intervalListener(IntervalEvent event) throws UnsupportedEncodingException, JSONException {
System.out.println("Recieved interval message: " + event);
IntervalType type = event.getType();
Instant instant = event.getInterval();
List<IntervalEvent> events = new ArrayList<>();
events.add(event);
events.size();
this.intervalLatch.countDown();
}
}
My events collection always has size = 1;
I tried to use different loops, but then, my collection become filed 530 000 000 times the same message.
UPDATE:
I have found a way to do it with factory.setBatchListener(true); But i need to find launch it with #Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow"). Right now this method is always is listening. Now iam trying something like this:
#Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow")
public void run() throws Exception {
kafkaIntervalListener.intervalLatch.await();
}
It doesn't work, in debug mode my breakpoint never works on this site.
The listener container is, by design, message-driven.
For fetching messages on-demand, it's better to use the Kafka Consumer API directly and fetch messages using the poll() method.

Test Event expiration in Drools Fusion CEP

Ciao, I have tested in several ways, but I'm still unable to test and verify the Event expiration mechanism in Drools Fusion, so I'm looking for some little guidance, please?
I've read the manual and I'm interested in this feature:
In other words, one an event is inserted into the working memory, it is possible for the engine to find out when an event can no longer match other facts and automatically retract it, releasing its associated resources.
I'm using the Drools IDE in Eclipse, 5.4.0.Final and I modified the template code created by the "New Drools Project" wizard to test and verify for Event expiration.
The code below. The way I understood to make the "lifecycle" to work correctly is that:
You must setup the KBase in STREAM mode - check
You must Insert the Events in temporal order - check
You must define temporal constraints between Events - check in my case is last Message()
However, when I inspect the EventFactHandle at the end, none of the Event() has expired.
Thanks for your help.
Java:
public class DroolsTest {
public static final void main(String[] args) {
try {
KnowledgeBase kbase = readKnowledgeBase();
// I do want the pseudo clock
KnowledgeSessionConfiguration conf = KnowledgeBaseFactory.newKnowledgeSessionConfiguration();
conf.setOption(ClockTypeOption.get("pseudo"));
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession(conf, null);
SessionPseudoClock clock = ksession.getSessionClock();
KnowledgeRuntimeLogger logger = KnowledgeRuntimeLoggerFactory.newFileLogger(ksession, "test");
// Insert of 2 Event:
Message message = new Message();
message.setMessage("Message 1");
message.setStatus(Message.HELLO);
ksession.insert(message);
ksession.fireAllRules();
clock.advanceTime(1, TimeUnit.DAYS);
Message message2 = new Message();
message2.setMessage("Message 2");
message2.setStatus(Message.HELLO);
ksession.insert(message2);
ksession.fireAllRules();
clock.advanceTime(1, TimeUnit.DAYS);
ksession.fireAllRules();
// Now I do check what I have in the working memory and if EventFactHandle if it's expired or not:
for (FactHandle f : ksession.getFactHandles()) {
if (f instanceof EventFactHandle) {
System.out.println(((EventFactHandle)f)+" "+((EventFactHandle)f).isExpired());
} else {
System.out.println("not an Event: "+f);
}
}
logger.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
private static KnowledgeBase readKnowledgeBase() throws Exception {
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
kbuilder.add(ResourceFactory.newClassPathResource("Sample.drl"), ResourceType.DRL);
KnowledgeBuilderErrors errors = kbuilder.getErrors();
if (errors.size() > 0) {
for (KnowledgeBuilderError error: errors) {
System.err.println(error);
}
throw new IllegalArgumentException("Could not parse knowledge.");
}
KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages(kbuilder.getKnowledgePackages());
// following 2 lines is the template code modified for STREAM configuration
KnowledgeBaseConfiguration config = KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
config.setOption( EventProcessingOption.STREAM );
return kbase;
}
/*
* This is OK from template, as from the doc:
* By default, the timestamp for a given event is read from the Session Clock and assigned to the event at the time the event is inserted into the working memory.
*/
public static class Message {
public static final int HELLO = 0;
public static final int GOODBYE = 1;
private String message;
private int status;
public String getMessage() {
return this.message;
}
public void setMessage(String message) {
this.message = message;
}
public int getStatus() {
return this.status;
}
public void setStatus(int status) {
this.status = status;
}
}
}
Drools:
package com.sample
import com.sample.DroolsTest.Message;
declare Message
#role(event)
end
declare window LastMessageWindow
Message() over window:length(1)
end
rule "Hello World"
when
accumulate( $m : Message(status==Message.HELLO) from window LastMessageWindow,
$messages : collectList( $m ) )
then
System.out.println( ((Message)$messages.get(0)).getMessage() );
end
Please note: even if I add expiration of 1second to the Message event, by
#expires(1s)
I still don't get the expected result that the very first Message event inserted, I would have expected is now expired? Thanks for your help.
Found solution! Obviously it was me being stupid and not realizing I was using Drools 5.4.0.Final while still referring to old documentation of 5.2.0.Final. In the updated documentation for Drools Fusion 5.4.0.Final, this box is added for 2.6.2. Sliding Length Windows:
Please note that length based windows do not define temporal constraints for event expiration from the session, and the engine will not consider them. If events have no other rules defining temporal constraints and no explicit expiration policy, the engine will keep them in the session indefinitely.
Therefore the 3rd requirement I originally enlisted of "You must define temporal constraints between Events" is obviously NOT met because I now understand Sliding Length Window in Drools 5.4.0.Final:
Message() over window:length(1)
are indeed NOT a definition of a temporal constraints for event expiration from the session.
Updating this answer hopefully somebody will find it helpful. Also, just so for your know, me being stupid actually for relying on googling in order to reach the doc, and sometimes you don't get redirected to the current release documentation, so it seems...