I need to debounce an input-stream.
At the first occurrence of state 1 I need to wait for 5 Seconds and verify if the laste state was also 1.
Only than I have a stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-1
(result) -> 1
Here is an example of a non-stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-0
(result) -> 0
I tried using a buffer, but a buffer has fixed starting point and I need to wait for 5 seconds starting with my first event.
Taking your requirements literally
At the first occurrence of state 1 I need to wait for 5 Seconds and
verify if the laste state was also 1. Only than I have a stable
signal.
I can come up with a few ways to solve this problem.
To clarify my assumptions, you just want to push the last value produced 5 seconds after the first occurrence of a 1. This will result in a single value sequence producing either a 0 or a 1 (ie. regardless of any further values produced past 5 seconds from the source sequence)
Here I recreate you sequence with some jiggery-pokery.
var source = Observable.Timer(TimeSpan.Zero,TimeSpan.FromSeconds(1))
.Take(10)
.Select(i=>{if(i==5 || i==7 || i==9){return 1;}else{return 0;}}); //Should produce 1;
//.Select(i=>{if(i==5 || i==7 ){return 1;}else{return 0;}}); //Should produce 0;
All of the options below look to share the sequence. To share a sequence safely in Rx we Publish() and connect it. I use automatic connecting via the RefCount() operator.
var sharedSource = source.Publish().RefCount();
1) In this solution we take the first value of 1, and then buffer the selected the values of the sequence in to buffer sizes of 5 seconds. We only take the first of these buffers. Once we get this buffer, we get the last value and push that. If the buffer is empty, I assume we push a one as the last value was the '1' that started the buffer from running.
sharedSource.Where(state=>state==1)
.Take(1)
.SelectMany(_=>sharedSource.Buffer(TimeSpan.FromSeconds(5)).Take(1))
.Select(buffer=>
{
if(buffer.Any())
{
return buffer.Last();
}
else{
return 1;
}
})
.Dump();
2) In this solution I take the approach to only start listening once we get a valid value (1) and then take all values until a timer triggers the termination. From here we take the last value produced.
var fromFirstValid = sharedSource.SkipWhile(state=>state==0);
fromFirstValid
.TakeUntil(
fromFirstValid.Take(1)
.SelectMany(_=>Observable.Timer(TimeSpan.FromSeconds(5))))
.TakeLast(1)
.Dump();
3) In this solution I use the window operator to create a single window that opens when the first value of '1' happens and then closes when 5 seconds elapses. Again we just take the last value
sharedSource.Window(
sharedSource.Where(state=>state==1),
_=>Observable.Timer(TimeSpan.FromSeconds(5)))
.SelectMany(window=>window.TakeLast(1))
.Take(1)
.Dump();
So lots of different ways to skin-a-cat.
It sounds (at a glance) like you want Throttle, not Buffer, although some more information on your use cases would help pin that down - at any rate, here's how you might Throttle your stream:
void Main()
{
var subject = new Subject<int>();
var source = subject.Publish().RefCount();
var query = source
// Start counting on a 1, wait 5 seconds, and take the last value
.Throttle(x => Observable.Timer(TimeSpan.FromSeconds(5)));
using(query.Subscribe(Console.WriteLine))
{
// This sequence should produce a one
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(1);
Console.ReadLine();
// This sequence should produce a zero
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
Console.ReadLine();
}
}
Related
My purpose to calculate success and fail message from source to destination per second and sum their results in daily bases.
I had two options to do that ;
stream events then group them time#source#destination
KeyValueBytesStoreSupplier streamStore = Stores.persistentKeyValueStore("store-name");
sourceStream.selectKey((k, v) -> v.getDataTime() + KEY_SEPERATOR + SRC + KEY_SEPERATOR + DEST ).groupByKey().aggregate(
DO SOME Aggregation,
Materialized.<String, AggregationObject>as(streamStore)
.withKeySerde(Serdes.String())
.withValueSerde(AggregationObjectSerdes));
After trying this approach above we noticed that state store is getting increase because of number of unique keys are increasing and if i am correct, because of state topics are only "compact" they are never expires.
NumberOfUniqueKeys = 86.400 seconds in a day X SOURCE X DESTINATION
Then we thought that if we do not put a time field in a KEY block, we can reduce state store size. We tried windowing operation as second approach.
using windowing operation with persistentWindowStore, CustomTimeStampExtractor, WindowBy, Suppress
WindowBytesStoreSupplier streamStore = Stores.persistentWindowStore("store-name", Duration.ofHours(6), Duration.ofSeconds(1), false);
sourceStream.selectKey((k, v) -> SRC + KEY_SEPERATOR + DEST)
.groupByKey() .windowedBy(TimeWindows.of(Duration.ofSeconds(1)).grace(Duration.ofSeconds(5)))
.aggregate(
{
DO SOME Aggregation
}, Materialized.<String, AggregationObject>as(streamStore)
.withKeySerde(Serdes.String())
.withValueSerde(AggregationObjectSerdes))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded())).toStream();`
After trying that second approach, we reduced state store size but now we had problem with late arrive events. Then we added grace period with 5 seconds with suppress operation and in addition using grace period and suppress operation did not guarantee to handle all late arrived events, another side effect of suppress operation is a latency because it emits result of aggregation after window grace period.
BTW
using windowing operation caused a getting WARNING message like
"WARN 1 --- [-StreamThread-2] o.a.k.s.state.internals.WindowKeySchema : Warning: window end time was truncated to Long.MAX"
I checked the reason from source code and I found from here
https://github.com/a0x8o/kafka/blob/master/streams/src/main/java/org/apache/kafka/streams/state/internals/WindowKeySchema.java
/**
* Safely construct a time window of the given size,
* taking care of bounding endMs to Long.MAX_VALUE if necessary
*/
static TimeWindow timeWindowForSize(final long startMs,
final long windowSize) {
long endMs = startMs + windowSize;
if (endMs < 0) {
LOG.warn("Warning: window end time was truncated to Long.MAX");
endMs = Long.MAX_VALUE;
}
return new TimeWindow(startMs, endMs);
}
BUT actually it does not make any sense to me that how endMs can be lower than 0...
Questions ?
What if we go through with approach 1, how can we reduce state store size ? In approach 1, It was guaranteed that all event will be processed and there will be no missing event because of latency.
What if we go through with approach 2, how should i tune my logic and catch late arrival data and reduce latency ?
Why do i get Warning message in approach 2 although all time fields are positive in my model ?
What can be other options that you can suggest other then these two approaches ?
I need some expert help :)
BR,
According to mail kafka mail group about warning message
WARNING message like "WARN 1 --- [-StreamThread-2] o.a.k.s.state.internals.WindowKeySchema : Warning: window end time was truncated to Long.MAX"
It was written to me :
You can get this message "o.a.k.s.state.internals.WindowKeySchema :
Warning: window end time was truncated to Long.MAX"" when your
TimeWindowDeserializer is created without a windowSize. There are two
constructors for a TimeWindowDeserializer, are you using the one with
WindowSize?
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L46-L55
It calls WindowKeySchema with a Long.MAX_VALUE
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L84-L90
Say ...
you have about 20 Thing
very often, you do a complex calculation running through a loop of say 1000 items. The end result is a varying number around 20 each time
you don't know how many there will be until you run through the whole loop
you then want to quickly (and of course elegantly!) access the result set in many places
for performance reasons you don't want to just make a new array each time. note that unfortunately there's a differing amount so you can't just reuse the same array trivially.
What about ...
var thingsBacking = [Thing](repeating: Thing(), count: 100) // hard limit!
var things: ArraySlice<Thing> = []
func fatCalculation() {
var pin: Int = 0
// happily, no need to clean-out thingsBacking
for c in .. some huge loop {
... only some of the items (roughly 20 say) become the result
x = .. one of the result items
thingsBacking[pin] = Thing(... x, y, z )
pin += 1
}
// and then, magic of slices ...
things = thingsBacking[0..<pin]
(Then, you can do this anywhere... for t in things { .. } )
What I am wondering, is there a way you can call to an ArraySlice<Thing> to do that in one step - to "append to" an ArraySlice and avoid having to bother setting the length at the end?
So, something like this ..
things = ... set it to zero length
things.quasiAppend(x)
things.quasiAppend(x2)
things.quasiAppend(x3)
With no further effort, things now has a length of three and indeed the three items are already in the backing array.
I'm particularly interested in performance here (unusually!)
Another approach,
var thingsBacking = [Thing?](repeating: Thing(), count: 100) // hard limit!
and just set the first one after your data to nil as an end-marker. Again, you don't have to waste time zeroing. But the end marker is a nuisance.
Is there a more better way to solve this particular type of array-performance problem?
Based on MartinR's comments, it would seem that for the problem
the data points are incoming and
you don't know how many there will be until the last one (always less than a limit) and
you're having to redo the whole thing at high Hz
It would seem to be best to just:
(1) set up the array
var ra = [Thing](repeating: Thing(), count: 100) // hard limit!
(2) at the start of each run,
.removeAll(keepingCapacity: true)
(3) just go ahead and .append each one.
(4) you don't have to especially mark the end or set a length once finished.
It seems it will indeed then use the same array backing. And it of course "increases the length" as it were each time you append - and you can iterate happily at any time.
Slices - get lost!
I'm trying to program Zolertia z1 node in Contiki, i need counter to go from 0 to 120, etimer should be set to 1 second delay(etimer_set(&et, CLOCK_SECOND)), and when i try to do counting it's constantly printing out same number (0 or 1), i think i should use PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et)) and probably etimer_restart,and after each second counter should be incremented and printed out (1, 2 ,3 ...), but obviously I'm not doing something correct in while loop or functions are not good?
This code works for me:
PROCESS_THREAD(hello_world_process, ev, data)
{
static struct etimer et;
static int counter;
PROCESS_BEGIN();
etimer_set(&et, CLOCK_SECOND);
while(1) {
PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));
printf("timer called, counter=%u\n", counter++);
etimer_reset(&et);
}
PROCESS_END();
}
Potential pitfalls:
there is no process-local storage for in-process variables in Contiki
processes. Meaning - if you want to save the values of local
variables across yields (such as PROCESS_WAIT_EVENT_UNTIL), declare
them as static. Most likely this is the problem you're facing, as it woul lead to the counter value being reset.
etimer_restart will drift, use etimer_reset instead to get the duration of exactly 120 seconds.
I would like a counter. The starting number will be:
10,000,000
Every 6 seconds, it will add 1, so it will be: 10,000,001 and then 10,000,002 and so on...
I would like to able to style the number: font-family, color, font-size, etc.
Can some please help me?
jQuery includes a function called setTimeout(), which causes a function to be called after a set time-delay. Something like the following would do what you are asking. Ensure that your document includes a DOM element with the id counter. Then:
var counter = 10000000;
function incrementCounter() {
counter++;
$('#counter').html(counter);
setTimeout(incrementCounter, 6000);
}
setTimeout(incrementCounter, 6000);
What’s going on here? setTimeout takes two arguments: the function to be called and the time-delay in milliseconds. The final line sets the function incrementCounter(), which we defined, to run after a delay of six seconds. The function increments the counter variable, sets the DOM object’s text to the value of the counter variable, then sets the timeout again: this means that the function will run every six seconds until something stops it.
As for styling the counter, this can be done either using static CSS or with the jQuery style-manipulation functions.
You can make use of setInterval to initiate a function which would be invoked in every 6000 milliseconds.
var num = 10000000;
setInterval(function()
{
num++;
console.log(num);
$('div').text(num.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ","));
},6000);
Here's an example : https://jsfiddle.net/DinoMyte/sac63azn/3/
Apologies if the question is poorly phrased, I'll do my best.
If I have a sequence of values with times as an Observable[(U,T)] where U is a value and T is a time-like type (or anything difference-able I suppose), how could I write an operator which is an auto-reset one-touch barrier, which is silent when abs(u_n - u_reset) < barrier, but spits out t_n - t_reset if the barrier is touched, at which point it also resets u_reset = u_n.
That is to say, the first value this operator receives becomes the baseline, and it emits nothing. Henceforth it monitors the values of the stream, and as soon as one of them is beyond the baseline value (above or below), it emits the elapsed time (measured by the timestamps of the events), and resets the baseline. These times then will be processed to form a high-frequency estimate of the volatility.
For reference, I am trying to write a volatility estimator outlined in http://www.amazon.com/Volatility-Trading-CD-ROM-Wiley/dp/0470181990 , where rather than measuring the standard deviation (deviations at regular homogeneous times), you repeatedly measure the time taken to breach a barrier for some fixed barrier amount.
Specifically, could this be written using existing operators? I'm a bit stuck on how the state would be reset, though maybe I need to make two nested operators, one which is one-shot and another which keeps creating that one-shot... I know it could be done by writing one by hand, but then I need to write my own publisher etc etc.
Thanks!
I don't fully understand the algorithm and your variables in the example, but you can use flatMap with some heap-state and return empty() or just() as needed:
int[] var1 = { 0 };
source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
If you need a per-sequence state because of multiple consumers, you can defer the whole thing:
Observable.defer(() -> {
int[] var1 = { 0 };
return source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
}).subscribe(...);