Does `persistentEntityRegistry.eventStream` really took atleast ~8-12 seconds to get triggered - apache-kafka

Just wanted what is the possible reasons why my persistentEntityRegistry.eventStream takes an approximately ~8-12 seconds to be emitted.

I have just figured out that its the cassandra that's taking a delay. What I've done is to set cassandra-query-journal.eventual-consistency-delay to 200ms.
My references are the following:
https://groups.google.com/forum/#!topic/lagom-framework/cLXf6r5Ouw4
https://groups.google.com/forum/#!topic/akka-user/TH8hL-A8I4k/discussion

Related

Delay fixed window from triggering for several minutes

Using Fixed Windows in Apache Beam. The watermark is set by the event time.
Some data may arrive out of order and cause the window to close.
How can a trigger be defined in Java to occur say 2 minutes after the last data was seen?
It's not entire clear what behavior you expect. One question is what do you expect to happen if the data arrives within the two minutes? Do you want to restart the two minutes interval, don't restart it, re-emit the data or not?
Looks like the trigger you are trying to describe is something along these lines:
wait until the watermark passed the end of window, in event time;
wait for additional 2 minutes in processing time;
emit the data;
If in step 2 it was event time, i.e. you wanted to re-emit the window if a late element arrives that fits within window + 2min, then you could use withAllowedLateness(). Though it sounds different from what you want, because it can keep re-emitting the window contents every time a matching late element arrives.
With processing-time in step 2 this is not possible in general with basic triggers that are available in Beam. You can probably achieve a behavior you want if you manually manage state and timers in your own ParDo, e.g. you can watch for the incoming elements, keep track on them in the state, and then on timer emit what you want. This can become very complicated and might still be not flexible enough for your specific use case.
One of the major problems is that there is no good way to define processing time triggers in Beam in general. It would be complicated to define a general mechanism of working with timers in this manner. For example, when you want to express "wait for 2 minutes", the framework needs to understand in relation to what these two minutes are, when to start the timer, so you need a mechanism to express that as well. And with composition, continuation and other complications this doesn't seem easy to reason about. So it's not in the framework in this general form.
In order to implement only the "wait for 2 minutes after the last element was seen in the window", the framework has to watch for it and set the timer. Technically it is possible to do something like this but doesn't seem like anyone has done it yet.
There seems to be only one meaningful processing time trigger available in Beam but it's not generic enough and doesn't do what you want. You can look at composite triggers like AfterFirst or AfterAll but they likely won't help you without a better general processing time trigger.
I decided against using Beam and implemented the solution in Kafka Streams.
I basically grouped by, then used fixed windows and the aggregated the result.
The "grace" on the window allows data to arrive late.
KGroupedStream<Long, OxyStreamItem> grouped = input.groupByKey();
TimeWindowedKStream<Long, OxyStreamItem> windowed =
grouped.windowedBy(
TimeWindows.of(WIN_SIZE)
.advanceBy(WIN_SIZE)
.grace(Duration.ofSeconds(5L)));
return windowed
.aggregate(
makeInitializer(),
makeAggregator(),
Materialized
.<Long, Aggregate, WindowStore<Bytes, byte[]>>as("tmp")
.withValueSerde(new AggregateSerde()))
.suppress(
Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.map(calculateAvg());

Time since a value was zero

I have an application that consumes work to do from an AWS topic. Work is added several times a day and my application quickly consumes it and the queue length goes back to 0. I am able to produce a metric for the length of the queue.
I would like a metric for the time since the length of queue was last zero. Any ideas how to get started?
Assuming a queue_size gauge that records the size of the queue, you can define a recorded rule like this:
# Timestamp of the most recent `queue_size` == 0 sample; else propagate the previous value
- record: last_empty_queue_timestamp
expr: timestamp(queue_size == 0) or last_empty_queue_timestamp
Then you can compute the time since the last time the queue was empty as simply as:
timestamp(queue_size) - last_empty_queue_timestamp
Note however that because this is a gauge (and because of the limitations of sampling), you may end up with weird results. E.g. if one work item is added every minute, your sampling interval is one minute and you sample exactly after the work items have been added, your queue may never (or very rarely) appear empty from the point of view of Prometheus. If that turns out to be an issue (or simply a concern) you may be better off having your application export a metric that is the last timestamp when something was added to an empty queue (basically what the recorded rule attempts to compute).
Similar to Alin's answer; upon revisiting this problem I found this from the Prometheus documentation:
https://prometheus.io/docs/practices/instrumentation/#timestamps,-not-time-since
If you want to track the amount of time since something happened, export the
Unix timestamp at which it happened - not the time since it happened.
With the timestamp exported, you can use the expression time() -
my_timestamp_metric to calculate the time since the event, removing the need for
update logic and protecting you against the update logic getting stuck.

Designs for counting occurrences of events in streaming processing?

The following discussion is in the context of Apache Flink:
Imagine that we have a keyedStream whose key is its id and event time is its timestamp, if we want to calculate how many events arrived within 10 minutes for each event.
The problems need to be solved are:
How to design the window ?
We can create a window of 10 minutes after each event arrives, but this mean that for each event, there will be a delay of 10 minutes because the wait for the window of 10 minutes.
We can create a window of 10 minutes which takes the timestamp of each event as the maximum timestamp in this window, which means that we don't need to wait for 10 minutes, because we take the last 10 minutes of elements before the element arrives. But this kind of window is not easy to define, as far as I know.
How to deal with memory or other resource issues ? Even we succeed to create a window, maybe the kind of ids of events are diverse, so many window like this, how the system keep their states in the memory ? There is a big possibility of stakoverflow of memory.
Maybe there are some problems that I don't mention here, or maybe there are some good solutions except window(i.e. Patterns). If you have a good solutions, please give me a clue, thank you.
You could do this with a GlobalWindow and a Trigger than fires on every event and an Evictor that removes events that are more than 10 minutes old before counting the remaining events. (A naive implementation could easily perform very poorly, however.)
Yes, this may require keeping a lot of state -- you'll be keeping every event from the past 10 minutes (well, you only need to store the timestamp from each event). If you setup the RocksDB state backend then Flink will spill to disk if need be, but with some obvious performance penalty. Probably better to use a cluster large enough to hold 10 minutes of traffic in memory. Even at one million events per second, each with a 32-bit timestamp, that's only 2.4GB in 10 minutes (1 million events per second x 600 seconds x 4 bytes per event) -- doesn't seem like a problem at all.

Gatling during explanation

I´m using Gatling, and I want to repeat a command for an hour so I see there´s one operator during.
The documentation it´s not clear enough
during
.during(duration, counterName, exitASAP) {
myChain
}
duration can be an Int for a duration in seconds, or a duration expressed like 500 milliseconds.
My question is, during will execute the task in 1 hour duration, or it will repeat the task for an hour.
I know we have repeat operator as well, but that it will require me to know how much time it takes my task to finish and then calc the number of repeats.
The code in the during block will run for the duration of seconds you give.
When the scenario is executed then the iterations in the during block are run until the duration.
Using .during() with amount of seconds. you can constantly increase the users, ramp, split the users but cannot repeat the task

Quartz triggers are firing ahead of time

We are using quartz 2.1.5, on 64 bit machine (clustered, 2 instances, 16GB ram). We have around 8000 triggers in the system.
Every second we have around 50 triggers - they get fired every second.
org.quartz.threadPool.threadCount = 50
org.quartz.scheduler.batchTriggerAcquisitionMaxCount=100
org.quartz.scheduler.idleWaitTime=15000
#org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow=0 (this is not set)
Quartz is able to handle the load, but triggers get fired ahead of time?
batchTriggerAcquisitionMaxCount - can we increase it to 500 and keep batchTriggerAcquisitionFireAheadTimeWindow at 1000 (1 sec), is there any disadvantage of these configuration?
Any other way?
with following configuration, it seems to work fine.
org.quartz.threadPool.threadCount = 100
org.quartz.scheduler.batchTriggerAcquisitionMaxCount=500
org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow=1000
org.quartz.scheduler.idleWaitTime=25000
When quartz wants to run your triggers it calls this method :
triggers = qsRsrcs.getJobStore().acquireNextTriggers(now + idleWaitTime, Math.min(availThreadCount, qsRsrcs.getMaxBatchSize()), qsRsrcs.getBatchTimeWindow());
Where :
idleWaitTime is org.quartz.scheduler.idleWaitTime
availThreadCount is the number of free threads (will be less or equal to org.quartz.threadPool.threadCount)
qsRsrcs.getMaxBatchSize() is org.quartz.scheduler.batchTriggerAcquisitionMaxCount
qsRsrcs.getBatchTimeWindow() is org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow
it leads to an SQL request like :
SELECT * FROM TRIGGERS WHERE NEXT_FIRE_TIME <= now + idleWaitTime + qsRsrcs.getBatchTimeWindow() LIMIT Math.min(availThreadCount, qsRsrcs.getMaxBatchSize())
So yes, Quartz always runs triggers ahead of time with idleWaitTime + qsRsrcs.getBatchTimeWindow(). The minimum number of triggers that it takes will be if you set getBatchTimeWindow to zero, and idleWaitTime to 1000 (it's a minimal value). In this case it will still take triggers that are supposed to happen 1 second ahead of time in addition to those, which are expected to run.
If you want to stop taking triggers ahead of time completely you can set batchTriggerAcquisitionMaxCount to 1. The downside in this case is that you can get too many SQL requests. You can try to play with batchTriggerAcquisitionMaxCount parameter and find the value that suits you best.
BTW Looking on the Quartz code you can see that setting batchTriggerAcquisitionMaxCount bigger than threadCount makes no sense.