RX - Notifications at a specified rate - system.reactive

I'm a newbie with RX and I'm facing a problem with "shaping the notifications traffic".
I wonder how can I notify observers with a given throughput; that is, I would like that "OnNext" method is called not before a given amount of time is elapsed since the last "OnNext" invocation.
For the sake of completeness: I want that every element in the sequence will be notified.
For example, with a 0.2 symbols/tick:
Tick: 0 10 20 30
|---------|---------|---------|
Producer: A---B------C--D-----E-------F
Result: A B C D E F
0 5 11 16 21 28
Is there a way to compose the observable or I have to implement my own Subject?
Thanks a lot

yeah just turn each value into an async process that does not complete until delay has elapsed and then concatenate them.
var delay = Observable.Empty<T>().Delay(TimeSpan.FromSeconds(2));
var rateLimited = source
.Select(item => Observable.Return(item).Concat(delay))
.Concat();

Related

KDB - Automatic function argument behavior with Iterators

I'm struggling to understand the behavior of the arguments in the below scan function. I understand the EWMA calc and have made an Excel worksheet to match in an attempt to try to understand but the kdb syntax is throwing me off in terms of what (and when) is x,y and z. I've referenced Q for Mortals, books and https://code.kx.com/q/ref/over/ and I do understand whats going on in the simpler examples provided.
I understand the EWMA formula based on the Excel calc but how is that translated into the function below?
x = constant, y= passed in values (but also appears to be prior result?) and z= (prev period?)
ewma: {{(y*1-x)+(z*x)} [x]\[y]};
ewma [.25; 15 20 25 30 35f]
15 16.25 18.4375 21.32813 24.74609
Rearranging terms makes it easier to read but if I were write this in Excel, I would incorrectly reference the y value column in the addition operator instead of correctly referencing the prev EWMA value.
ewma: {{y+x*z-y} [x]\[y]};
ewma [.25; 15 20 25 30 35f]
15 16.25 18.4375 21.32813 24.74609
EWMA in Excel formula for auditing
0N! is useful in these cases for determining variables passed. Simply add to start of function to display variable in console. EG. to show what z is being passed in as each run:
q)ewma: {{0N!z;(y*1-x)+(z*x)} [x]\[y]};
q)ewma [.25; 15 20 25 30 35f]
15f
16.25
18.4375
21.32812
//Or multiple at once
q)ewma: {{0N!(x;y;z);(y*1-x)+(z*x)} [x]\[y]};
q)
q)ewma [.25; 15 20 25 30 35f]
0.25 15 20
0.25 16.25 25
0.25 18.4375 30
0.25 21.32812 35
Edit:
To think about why z is holding 'y' values it is best to think about below simplified example using just x/y.
//two parameters specified in beginning.
//x initialised as 1 then takes the function result for next run
//y takes value of next value in list
q){0N!(x;y);x+y}\[1;2 3 4]
1 2
3 3
6 4
3 6 10
//in this example only one parameter is passed
//but q takes first value in list as x in this special case
q){0N!(x;y);x+y}\[1 2 3 4]
1 2
3 3
6 4
1 3 6 10
A similar occurrence is happening in your example. x is not being passed to the the iterator and therefore will assume the same value in each run.
The inner function y value will be initilised taking the first value of the outer y variable (15f in this case) like above simplified example. Then the z takes the 2nd value of the list for it's initial run. y then takes the result of previous function run and z takes the next value in the list until how list has bee passed to function.

Siddhi delayed query

I am struggling to understand this query:
from heartbeats#window.time(1 hour) insert expired events into delayedStream;
from every e = heartbeats -> e2 = heartbeats[deviceId == e.deviceId]
or expired = delayedStream[deviceId == e.deviceId]
within 1 hour 10 minutes
select e.deviceId, e2.deviceId as id2, expired.deviceId as id3
insert into tmpStream;
The first query delays all Events by 1 hour.
The second query filters all Events that occured 1 hour ago and no newer Events have been found.
This works but I dont understand this part:
from every e = heartbeats -> e2 = heartbeats[deviceId == e.deviceId] or expired = delayedStream[deviceId == e.deviceId]
The second part of the query (or expired = ...) checks if the Event with the given deviceId is on the delayedStream. What is the purpose of the first part and how does it come together, that this query finds devices that sent no data for more than 1 hour?
I don't think the above query will be accurate if you want to check if a sensor did not send reading for the last 1 hour. I tweaked the windows as 1 minute and sent 2 events,
[2019-07-19 16:48:23,774] heartbeats : Event{timestamp=1563535103772, data=[1], isExpired=false}
[2019-07-19 16:48:24,696] tmpStream : Event{timestamp=1563535104694, data=[1, 1, null], isExpired=false}
[2019-07-19 16:48:24,697] heartbeats : Event{timestamp=1563535104694, data=[1], isExpired=false}
[2019-07-19 16:49:23,774] tmpStream : Event{timestamp=1563535163772, data=[1, null, 1], isExpired=false}
Let's say events arrive at 10 and 10.15, the outputs at the tmpStream will be at 10.15 (first part) and 11 (due to delayed stream). The second match is incorrect as it has to match at 11.15 as per use case.
However, if you want to improve the query you can use the Siddhi detecting non-occurance pattern feature for your use case, https://siddhi.io/en/v5.0/docs/query-guide/#detecting-non-occurring-events, it will be simpler

Apache Beam - sliding windows for kinesis stream

I am trying to do a sliding window of 1 hr(3600 secs TimeWindowSize) and 5 secs(TimeWindowSamplingFrequency) with kinesis stream processed events,
but I am receiving the processed events in every 5 secs and its not doing the sliding window of 1 hr to give me the one hour result of the events transform i want.
As per my understand , it should wait and process the 1 hour events coming in from kinesis stream and then give me an output after 1 hr.
following is the sample code i used
pipeline.apply(
KinesisIO.read()
.withStreamName(options.getEnrichedSnowplowEventsStreamName())
.withAWSClientsProvider(new DefaultAWSClientsProvider())
.withInitialPositionInStream(InitialPositionInStream.LATEST))
.apply(MapElements.into(TypeDescriptors.strings())
.via(record -> new String(record.getDataAsBytes())))
.apply(ParseSnowplowEvents.fromStrings())
.apply(a userdefined ParDo transform which gives an op of
PCollection<Class> objects )
.apply(Window
.into(SlidingWindows
.of(
Duration.standardSeconds(
3600))
.every(Duration.standardSeconds(
5))
)).apply(
a userdefined transform with ParDo which gives me the o/p of PCollection<KV<Integer, Double>>>)
.apply(PrintValue.andPassOn());
PrintValue.andPassOn() userdefined transform prints the data for me , but i am expecting the result PCollection<KV<Integer,Double>> at the end of one hour sliding window , instead it prints out at every 5 secs the KV pairs
2018-06-17T13:11:29.999Z - KV{101, 5.0}
2018-06-17T13:11:34.999Z - KV{102, 0.4}
2018-06-17T13:11:39.999Z
KV{104, 0.5}
It is printing as per your sampling frequency. Change it to one hour and it should work as expected.

RX and buffering

I'm trying to obtain the following observable (with a buffer capacity of 10 ticks):
Time 0 5 10 15 20 25 30 35 40
|----|----|----|----|----|----|----|----|
Source A B C D E F G H
Result A E H
B F
C G
D
Phase |<------->|-------|<------->|<------->|
B I B B
That is, the behavior is very similar to the Buffer observable with the difference that the buffering phase is not in precise time slot, but starts at the first symbol pushed in the idle phase. I mean, in the example above the buffering phases start with the 'A', 'E', and 'H' symbols.
Is there a way to compose the observable or do I have to implement it from scratch?
Any help will be appreciated.
Try this:
IObservable<T> source = ...;
IScheduler scheduler = ...;
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))
.Amb(obs.IgnoreElements())));
The buffer closing selector is called once at the start and then once whenever a buffer closes. The selector says "The buffer being started now should be closed duration after the first element of this buffer, or when the source completes, whichever occurs first."
Edit: Based on your comments, if you want to make multiple subscriptions to query share a single subscription to source, you can do that by appending .Publish().RefCount() to the query.
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))
.Amb(obs.IgnoreElements())));
.Publish()
.RefCount();

GraphX: Wrong output without cache()

I'm doing the following:
var count = 0
while(count > 0){
val messages = graph.vertices.flatMap{
// Create messages for other nodes
}
// Cache which is critical for the correct execution
count.cache()
count = messages.count()
val msgType1 = messages.filter()
val msgType2 = messages.filter()
println(count)
//Should be exactly messages.count()
println(msgType1.count() + msgType2.count())
println("---")
}
If I'm executing it exactly like this then the output is:
8
6 2
---
11
3 8
---
0
0 0
---
which add up exactly to the message count.
If I'm removing the count.cache() after the flatMap-operation, then the filtering of the messages is wrong after counting the messages. It looks like the counting clears the messages or something like that.
The output is then:
8
0 0
---
0
0 0
---
Why is that happening? Is it okay that my program only works if I'm using the cache operation at that point or should it also work without caching the messages?
My problem was, that if flatmap() was called once in one loop iteration, then the output was correct.
If it is called twice in one iteration (which could happen, if the messages must be recomputed) then the first output was correct and the following not, because my opertions inside the flatmap() can only be executed one time per node and node multiple times.
So if I call cache() the flatmap is executed only once. Without cache it is called for every count() operation, so the first was correct and the following two wrong.