Callback when the query has finished processing in Siddhi - complex-event-processing

I am writing a small CEP program using Siddhi. I can add a callback whenever a given filter outputs a data like this
executionPlanRuntime.addCallback("query1", new QueryCallback() {
#Override
public void receive(long timeStamp, Event[] inEvents, Event[] removeEvents) {
EventPrinter.print(inEvents);
System.out.println("data received after processing");
}
});
but is there is a way to know that the filter has finished processing and it won't give any more of the above callback. Something like didFinish. I think that would be the ideal place for shutting down SiddhiManager and ExecutionPlanRuntime instances.

No. There in no such functionality and can't be supported in the future also. Rationale behind that is, in real time stream processing queries will process the incoming stream and emit an output stream. There is no concept as 'finished processing'. Query will rather process event as long as there is input.
Since your requirement is to shutdown SiddhiManager and ExecutionPlanRuntime, recommended way is to do this inside some cleaning method of your program. Or else you can write some java code inside callback to count responses or time wait and call shutdown. Hope this helps!!

Related

rxdart: Get buffered elements on stream subscription cancel

I'm using a rxdart ZipStream within my app to combine two streams of incoming bluetooth data. Those streams are used along with "bufferCount" to collect 500 elements each before emitting. Everything works fine so far, but if the stream subscription gets cancelled at some point, there might be a number of elements in those buffers that are omitted after that. I could wait for a "buffer cycle" to complete before cancelling the stream subscription, but as this might take some time depending on the configured sample rate, I wonder if there is a solution to get those buffers as they are even if the number of elements might be less than 500.
Here is some simplified code for explanation:
subscription = ZipStream.zip2(
streamA.bufferCount(500),
streamB.bufferCount(500),
(streamABuffer, streamBBuffer) {
return ...;
},
).listen((data) {
...
});
Thanks in advance!
So for anyone wondering: As bufferCount is implemented with BufferCountStreamTransformer which extends BackpressureStreamTransformer, there is a dispatchOnClose property that defaults to true. That means if the underlying stream whose emitted elements are buffered is closed, then the remaining elements in that buffer are emitted finally. This also applies to the example above. My fault was to close the stream and to cancel the stream subscription instantly. With awaiting the stream's closing and cancelling the stream subscription afterwards, everything works as expected.

Use kafka to detect changes on values

I have a streaming application that continuously takes in a stream of coordinates along with some custom metadata that also includes a bitstring. This stream is produced onto a kafka topic using producer API. Now another application needs to process this stream [Streams API] and store the specific bit from the bit string and generate alerts when this bit changes
Below is the continuous stream of messages that need to be processed
{"device_id":"1","status_bit":"0"}
{"device_id":"2","status_bit":"1"}
{"device_id":"1","status_bit":"0"}
{"device_id":"3","status_bit":"1"}
{"device_id":"1","status_bit":"1"} // need to generate alert with change: 0->1
{"device_id":"3","status_bits":"1"}
{"device_id":"2","status_bit":"1"}
{"device_id":"3","status_bits":"0"} // need to generate alert with change 1->0
Now I would like to write these alerts to another kafka topic like
{"device_id":1,"init":0,"final":1,"timestamp":"somets"}
{"device_id":3,"init":1,"final":0,"timestamp":"somets"}
I can save the current bit in the state store using something like
streamsBuilder
.stream("my-topic")
.mapValues((key, value) -> value.getStatusBit())
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
.reduce((oldAggValue, newMessageValue) -> newMessageValue, Materialized.as("bit-temp-store"));
but I am unable to understand how can I detect this change from the existing bit. Do I need to query the state store somehow inside the processor topology? If yes? How? If no? What else could be done?
Any suggestions/ideas that I can try(maybe completely different from what I am thinking) are also appreciated. I am new to Kafka and thinking in terms of event driven streams is eluding me.
Thanks in advance.
I am not sure this is the best approach, but in the similar task I used an intermediate entity to capture the state change. In your case it will be something like
streamsBuilder.stream("my-topic").groupByKey()
.aggregate(DeviceState::new, new Aggregator<String, Device, DeviceState>() {
public DeviceState apply(String key, Device newValue, DeviceState state) {
if(!newValue.getStatusBit().equals(state.getStatusBit())){
state.setChanged(true);
}
state.setStatusBit(newValue.getStatusBit());
state.setDeviceId(newValue.getDeviceId());
state.setKey(key);
return state;
}
}, TimeWindows.of(…) …).filter((s, t) -> (t.changed())).toStream();
In the resulting topic you will have the changes. You can also add some attributes to DeviceState to initialise it first, depending whether you want to send the event, when the first device record arrives, etc.

Function now executing properly after subscribe

I am having a Mono object, On which I have subscribed for doOnsuccess, In this method again I am saving the data in DB(CouchBase Using ReactiveCouchbaseRepository). after that, I am not getting any logs for Line1 and line2.
But this is working fine if I do not save this object, means I am getting logs for line 2.
Mono<User> result = context.getPayload(User.class);
result.doOnSuccess( user -> {
System.out.println("############I got the user"+user);
userRepository.save(user).doOnSuccess(user2->{
System.out.println("user saved"); // LINE 1
}).subscribe();
System.out.println("############"+user); // LINE2
}).subscribe();
Your code snippet is breaking a few rules you should follow closely:
You should not call subscribe from within a method/lambda that returns a reactive type such as Mono or Flux; this will decouple the execution from the main task while they'll both still operate on that shared state. This often ends up on issues because things are trying to read twice the same stream. It's a bit like you're trying to create two separate threads that try reading on the same outputstream.
you should not do I/O operations in doOnXYZ operators. Those are "side-effects" operators, meaning they are useful for logging, increment counters.
What you should try instead to chain Reactor operators to create a single reactive pipeline and return the reactive type for the final client to subscribe on it. In a Spring WebFlux application, the HTTP clients (through the WebFlux engine) are subscribing.
Your code snippet could look like this:
Mono<User> result = context.getPayload(User.class)
.doOnSuccess(user -> System.out.println("############Received user "+user))
.flatMap(user -> {return userRepository.save(user)})
.doOnSuccess(user -> System.out.println("############ Saved "+user));
return result;

Spring Reactor | Batching the input without mutating

I'm trying to batch the records constantly emitted from a streaming source (Kafka) and call my service in a batch of 100.
What I get as the input is a single record. I'm trying what's the best way to achieve it in the Reactive way using Spring Reactor without having to have a mutation and locking outside the pipeline.
Here is my naive attempt which simply reflects my sequential way of thinking:
Mono.just(input)
.subscribe(i -> {
batches.add(input);
if(batches.size() >= 100) {
// Invoke another reactive pipeline.
// Clear the batch (requires locking in order to be thread safe).
}
});
What's the best way to achieve batching on a streaming source using reactor.
.buffer(100) or bufferTimeout(100, Duration.ofSeconds(xxx) comes to the rescue
Using Flux.buffer or Flux.bufferTimeout you will be capable of gathering the fixed amount of elements into the List
StepVerifier.create(
Flux.range(0, 1000)
.buffer(100)
)
.expectNextCount(10)
.expectComplete()
.verify()
Update for the use case
In case, when the input is a single value, suppose like an invocation of the method with parameter:
public void invokeMe(String element);
You may adopt UnicastProcessor technique and transfer all data to that processor so then it will take care of batching
class Batcher {
final UnicastProcessor processor = UnicastProcessor.create();
public void invokeMe(String element) {
processor.sink().next(element);
// or Mono.just(element).subscribe(processor);
}
public Flux<List<String>> listen() {
return processor.bufferTimeout(100, Duration.ofSeconds(5));
}
}
Batcher batcher = new Batcher();
StepVerifier.create(
batcher.listen()
)
.then(() -> Flux.range(0, 1000)
.subscribe(i -> batcher.invokeMe("" + i)))
.expectNextCount(10)
.thenCancel()
.verify()
From that example, we might learn how to provide a single point of receiving events and then listen to results of the batching process.
Please note that UnicastPorcessor allows only one subscriber, so it will be useful for the model when there is one interested party in batching results and many data producers. In a case when you have subscribers as many as producers you may want to use one of the next processors -> DirectProcessor, TopicProcessor, WorkerQueueProcessor. To learn more about Reactor Processors follow the link

Rx Extensions - Proper way to use delay to avoid unnecessary observables from executing?

I'm trying to use delay and amb to execute a sequence of the same task separated by time.
All I want is for a download attempt to execute some time in the future only if the same task failed before in the past. Here's how I have things set up, but unlike what I'd expect, all three downloads seem to execute without delay.
Observable.amb([
Observable.catch(redditPageStream, Observable.empty()).delay(0 * 1000),
Observable.catch(redditPageStream, Observable.empty()).delay(30 * 1000),
Observable.catch(redditPageStream, Observable.empty()).delay(90 * 1000),
# Observable.throw(new Error('Failed to retrieve reddit page content')).delay(10000)
# Observable.create(
# (observer) ->
# throw new Error('Failed to retrieve reddit page content')
# )
]).defaultIfEmpty(Observable.throw(new Error('Failed to retrieve reddit page content')))
full code can be found here. src
I was hoping that the first successful observable would cancel out the ones still in delay.
Thanks for any help.
delay doesn't actually stop the execution of what ever you are doing it just delays when the events are propagated. If you want to delay execution you would need to do something like:
redditPageStream.delaySubscription(1000)
Since your source is producing immediately the above will delay the actual subscription to the underlying stream to effectively delay when it begins producing.
I would suggest though that you use one of the retry operators to handle your retry logic though rather than rolling your own through the amb operator.
redditPageStream.delaySubscription(1000).retry(3);
will give you a constant retry delay however if you want to implement the linear backoff approach you can use the retryWhen() operator instead which will let you apply whatever logic you want to the backoff.
redditPageStream.retryWhen(errors => {
return errors
//Only take 3 errors
.take(3)
//Use timer to implement a linear back off and flatten it
.flatMap((e, i) => Rx.Observable.timer(i * 30 * 1000));
});
Essentially retryWhen will create an Observable of errors, each event that makes it through is treated as a retry attempt. If you error or complete the stream then it will stop retrying.