How to buffer based on time and count, but stopping the timer if no events occur

How to buffer based on time and count, but stopping the timer if no events occur - system.reactive

I'm producing a sequence of 50 items each tree seconds. I then want to batch them at max 20 items, but also not waiting more than one second before I release the buffer.
That works great!
But since the interval never dies, Buffer keeps firing empty batch chunks...
How can I avoid that? Shure Where(buf => buf.Count > 0)should help - but that seems like a hack.
Observable
.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge()
.Buffer(TimeSpan.FromSeconds(1), 20)
.Subscribe(e => Console.WriteLine(e.Count));
Output:
0-0-0-20-20-10-0-20-20-10-0-0-20-20

The Where filter you propose is a sound approach, I'd go with that.
You could wrap the Buffer and Where into a single helper method named to make the intent clearer perhaps, but rest assured the Where clause is idiomatic Rx in this scenario.
Think of it this way; an empty Buffer is relaying information that no events occurred in the last second. While you can argue that this is implicit, it would require extra work to detect this if Buffer didn't emit an empty list. It just so happens it's not information you are interested in - so Where is an appropriate way to filter this information out.
A lazy timer solution
Following from your comment ("...the timer... be[ing] lazily initiated...") you can do this to create a lazy timer and omit the zero counts:
var source = Observable.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge();
var xs = source.Publish(pub =>
pub.Buffer(() => pub.Take(1).Delay(TimeSpan.FromSeconds(1))
.Merge(pub.Skip(19)).Take(1)));
xs.Subscribe(x => Console.WriteLine(x.Count));
Explanation
Publishing
This query requires subscribing to the source events multiple times. To avoid unexpected side-effects, we use Publish to give us pub which is a stream that multicasts the source creating just a single subscription to it. This replaces the older Publish().RefCount() technique that achieved the same end, effectively giving us a "hot" version of the source stream.
In this case, this is necessary to ensure the subsequent buffer closing streams produced after the first will start with the current events - if the source was cold they would start over each time. I wrote a bit about publishing here.
The main query
We use an overload of Buffer that accepts a factory function that is called for every buffer emitted to obtain an observable stream whose first event is a signal to terminate the current buffer.
In this case, we want to terminate the buffer when either the first event into the buffer has been there for a full second, or when 20 events have appeared from the source - whichever comes first.
To achieve this we Merge streams that describe each case - the Take(1).Delay(...) combo describes the first condition, and the Skip(19).Take(1) describes the second.
However, I would still test performance the easy way, because I still suspect this is overkill, but a lot depends on the precise details of the platform and scenario etc.

After using the accepted answer for quite a while I would now suggest a different implementation (inspired by James Skip / Take approach and this answer):
var source = Observable.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge();
var xs = source.BufferOmitEmpty(TimeSpan.FromSeconds(1), 20);
xs.Subscribe(x => Console.WriteLine(x.Count));
With an extension method BufferOmitEmpty like:
public static IObservable<IList<TSource>> BufferOmitEmpty<TSource>(this IObservable<TSource> observable, TimeSpan maxDelay, int maxBufferCount)
{
return observable
.GroupByUntil(x => 1, g => Observable.Timer(maxDelay).Merge(g.Skip(maxBufferCount - 1).Take(1).Select(x => 1L)))
.Select(x => x.ToArray())
.Switch();
}
It is 'lazy', because no groups are created as long as there are no elements on the source sequence, so there are no empty buffers. As in Toms answer there is an other nice advantage to the Buffer / Where implementation, that is the buffer is started when the first element arrives. So elements following each other within buffer time after a quiet period are processed in the same buffer.
Why not to use the Buffer method
Three problems occured when I was using the Buffer approach (they might be irrelevant for the scope of the question, so this is a warning to people who use stack overflow answers in different contexts like me):
Because of the Delay one thread is used per subscriber.
In scenarios with long running subscribers elements from the source sequence can be lost.
With multiple subscribers it sometimes creates buffers with count greater than maxBufferCount.
(I can supply sample code for 2. and 3. but I'm insecure whether to post it here or in a different question because I cannot fully explain why it behaves this way)

RxJs5 has hidden features buried into their source code. It turns out it's pretty easy to achieve with bufferTime
From the source code, the signature looks like this:
export function bufferTime<T>(this: Observable<T>, bufferTimeSpan: number, bufferCreationInterval: number, maxBufferSize: number, scheduler?: IScheduler): Observable<T[]>;
So your code would be like this:
observable.bufferTime(1000, null, 20)

Related

Opportunistic, partially and asyncronously pre-processing of a syncronously processing iterator

Let us use Scala.
I'm trying to find the best possible way to do an opportunistic, partial, and asynchronous pre-computation of some of the elements of an iterator that is otherwise processed synchronously.
The below image illustrates the problem.
There is a lead thread (blue) that takes an iterator and a state. The state contains mutable data that must be protected from concurrent access. Moreover, the state must be updated while the iterator is processed from the beginning, sequentially, and in order because the elements of the iterator depend on previous elements. Moreover, the nature of the dependency is not known in advance.
Processing some elements may lead to substantial overhead (2 orders of magnitude) compared to others, meaning that some elements are 1ms to compute and some elements are 300ms to compute. It would lead to significant improvements in terms of running time if I could pre-process the next k elements speculatively. A speculative pre-processing on asynchronous threads is possible (while the blue thread is synchronously processing), but the pre-processed data must be validated by the blue thread, whether the result of pre-computation is valid at that time. Usually (90% of the time), it should be valid. Thus, launching separate asynchronous threads to pre-process the remaining portion of the iterator speculatively would spear many 300s of milliseconds in running time.
I have studied comparisons of asynchronous and functional libraries of Scala to understand better which model of computation, or in other words, which description of computation (which library) could be a better fit to this processing problem. I was thinking about communication patterns and came about with the following ideas:
AKKA
Use an AKKA actor Blue for the blue thread that takes the iterator, and for each step, it sends a Step message to itself. On a Step message, before it starts the processing of the next ith element, it sends a PleasePreprocess(i+k) message with the i+kth element to one of the k pre-processor actors in place. The Blue would Step to i+1 only and only if PreprocessingKindlyDone(i+1) is received.
AKKA Streams
AFAIK AKKA streams also support the previous two-way backpressure mechanism, therefore, it could be a good candidate to implement what actors do without actually using actors.
Scala Futures
While the blue thread processes elements ˙processElement(e)˙ in iterator.map(processElement(_)), then it would also spawn Futures for preprocessing. However, maintaining these pre-processing Futures and awaiting their states would require a semi-blocking implementation in pure Scala as I see, so I would not go with this direction to the best of my current knowledge.
Use Monix
I have some knowledge of Monix but could not wrap my head around how this problem could be elegantly solved with Monix. I'm not seeing how the blue thread could wait for the result of i+1 and then continue. For this, I was thinking of using something like a sliding window with foldLeft(blueThreadAsZero){ (blue, preProc1, preProc2, notYetPreProc) => ... }, but could not find a similar construction.
Possibly, there could be libraries I did not mention that could better express computational patterns for this.
I hope I have described my problem adequately. Thank you for the hints/ideas or code snippets!

You need blocking anyhow, if your blue thread happens to go faster than the yellow ones. I don't think you need any fancy libraries for this, "vanilla scala" should do (like it actually does in most cases). Something like this, perhaps ...
def doit[T,R](it: Iterator[T], yellow: T => R, blue: R => R): Future[Seq[R]] = it
.map { elem => Future(yellow(elem)) }
.foldLeft(Future.successful(List.empty[R])) { (last, next) =>
last.flatMap { acc => next.map(blue).map(_ :: acc) }
}.map(_.reverse)
I didn't test or compile this, so it could need some tweaks, but conceptually, this should work: pass through the iterator and start preprocessing right away, then fold to tuck the "validation" on each completing preprocess sequentially.

I would split the processing into two steps, the pre-processing that could be run in parallel and the dependent one which has to be serial.
Then, you can just create a stream of data from the iterator do a parallel map applying the preprocess step and finish with a fold
Personally I would use fs2, but the same approach can be expressed with any streaming solution like AkkaStreams, Monix Observables or ZIO ZStreams
import fs2.Stream
import cats.effect.IO
val finalState =
Stream
.fromIterator[IO](iterator = ???, chunkSize = ???)
.parEvalMap(elem => IO(preProcess(elem))
.compile
.fold(initialState) {
case (acc, elem) =>
computeNewState(acc, elem)
}
PS: Remember to benchmark to make sure parallelism is actually speeding things up; it may not be worth the hassle.

Why/How should I use Publish without Connect?

Why/how should I use .Publish() without a Connect or RefCount call following? What does it do? Example code:
var source = new Subject<int>();
var pairs = source.Publish(_source => _source
.Skip(1)
.Zip(_source, (newer, older) => (older, newer))
);
pairs.Subscribe(p => Console.WriteLine(p));
source.OnNext(1);
source.OnNext(2);
source.OnNext(3);
source.OnNext(4);
How is pairs different from pairs2 here:
var pairs2 = source
.Skip(1)
.Zip(source, (newer, older) => (older, newer));

The Publish<TSource, TResult>(Func<IObservable<TSource, IObservable<TResult>> selector) overload is poorly documented. Lee Campbell doesn't cover it in introtorx.com. It doesn't return an IConnectableObservable, which is what most people associate with Publish, and therefore doesn't require or support a Connect or RefCount call.
This form of Publish is basically a form of defensive coding, against possible side-effects in a source observable. It subscribes once to the source, then can safely 'multicast' all messages via the passed in parameter. If you look at the question code, there's only once mention of source, and two mentions of _source. _source here is the safely multicasted observable, source is the unsafe one.
In the above example, the source is a simple Subject, so it's not really unsafe, and therefore Publish has no effect. However, if you were to replace source with this:
var source = Observable.Create<int>(o =>
{
Console.WriteLine("Print me once");
o.OnNext(1);
o.OnNext(2);
o.OnNext(3);
o.OnNext(4);
return System.Reactive.Disposables.Disposable.Empty;
});
...you would find "Print me once" printed once with pairs (correct), and twice with pairs2. This effect has similar implications where your observable wraps things like DB queries, web requests, network calls, file reads, and other side-effecting code that you want to happen only once and not multiple times.
TL;DR: If you have an observable query that references an observable twice, it is best to wrap that observable in a Publish call.

Confused about Observable vs. Single in functions like readCharacteristic()

In the RxJava2 version of RxAndroidBle, the functions readCharacteristic() and writeCharacteristic() return Single<byte[]>.
The example code to read a characteristic is:
device.establishConnection(false).flatMap(rxBleConnection -> rxBleConnection.readCharacteristic(characteristicUUID))
But the documentation for flatMap() says the mapping function is supposed to return an ObservableSource. Here, it returns a Single. How can this work?
Update: I looked at possibilities using operators like .single() and .singleOrError() but they all seem to require that the upstream emits one item and then completes. But establishConnection() doesn't ever complete. (This is one reason I suggested that perhaps establishConnection() should be reimagined as a Maybe, and some other way be provided to disconnect rather than just unsubscribing.)

You're totally correct, this example cannot be compiled. it's probably leftover from RxJava1 version, where Single wasn't exists.
Simple fix with the same result is to use RxJava2 flatMapSingle for instance:
device.establishConnection(false)
.flatMapSingle(rxBleConnection -> rxBleConnection.readCharacteristic(characteristicUUID))
flatMapSingle accepts a Single as the return value, and will map the success value of the input Single to an emission from the upstream Observable.
The point is, that RxJava has more specific Observable types, that exposes the possible series of emission expected from this Observable. Some methods now return Single as this is the logical operation of their stream (readCharacteristic()), some Observable as they will emit more than single emission (establishConnection() - connection status that can be changed over time).
But RxJava2 also provided many operators to convert between the different types and it really depends on your needs and scenario.

Thanks Rob!
In fact, the README was deprecated and required some pimping here and there. Please have a look if it's ok now.

I think I found the answer I was looking for. The crucial point:
Single.fromObservable(observableSource) doesn't do anything until it receives the second item from observableSource! Assuming that the first item it receives is a valid emission, then if the second item is:
onComplete(), it passes the first item to onSuccess();
onNext(), it signals IndexOutOfBoundsException since a Single can't emit more than one item;
onError(), it presumably forwards the error downstream.
Now, device.establishConnection() is a 1-item, non-completing Observable. The RxBleConnecton it emits is flatMapped to a Single with readCharacteristic(). But (another gotcha), flatMapSingle subscribes to these Singles and combines them into an Observable, which doesn't complete until the source establishConnection() does. But the source doesn't ever complete! Therefore the Single we're trying to create won't emit anything, since it doesn't receive that necessary second item.
The solution is to force the generation of onComplete() after the first (and only) item, which can be done with take(1). This will satisfy the Single we're creating, and cause it to emit the Characteristic value we're interested in. Hope that's clear.
The code:
Single<byte[]> readCharacteristicSingle( RxBleDevice device, UUID characteristicUUID ) {
return Single.fromObservable(
device.establishConnection( false )
.flatMapSingle( connection -> connection.readCharacteristic( characteristicUUID ) )
.take( 1L ) // make flatMapSingle's output Observable complete after the first emission
// (this makes the Single call onSuccess())
);
}

Rx Observable Window with closing function with parameter

I'm trying to separate observable into windows (or for my purposes also Buffers are fine) while being able to close windows/buffers at custom location.
E.g. I have an observable which produces integers starting at 1 and moving up. I want to close a window at each number which is divisible by 7. My closing function would need to take in the item as parameter in that case.
There is an overload of Window method:
Window<TSource, TWindowClosing>(IObservable<TSource>, Func<IObservable<TWindowClosing>>)
Either it cant be done using this overload, or I can't wrap my head around it. Documentation describes that it does exactly what I want but does not show an example. Also, it shows an example of non-deterministic closing, which depends on timing when closing observable collection emits items.
The Window operator breaks up an observable sequence into consecutive
non-overlapping windows. The end of the current window and start of
the next window is controlled by an observable sequence which is the
result of the windowClosingSelect function which is passed as an input
parameter to the operator. The operator could be used to group a set
of events into a window. For example, states of a transaction could be
the main sequence being observed. Those states could include:
Preparing, Prepared, Active, and Committed/Aborted. The main sequence
could include all of those states are they occur in that order. The
windowClosingSelect function could return an observable sequence that
only produces a value on the Committed or Abort states. This would
close the window that represented transaction events for a particular
transaction.
I'm thinking something like following would do the job, but I'd have to implement it myself:
Window<TSource, TWindowClosing>(IObservable<TSource>, Func<TSource, bool>)
Is such windowing possible with built-in functions (I know I can build one myself)?
Is it possible to close a window based on emitted item or only non-deterministically, once an item is emitted from windowing observable?

Use the original sequence with a Where clause as your closing sequence. If your source sequence is cold, then make use of Publish and RefCount to make it work correctly.
var source = ...;
var sharedSource = source.Publish().RefCount();
var closingSignal = sharedSource.Where(i => (i % 7) == 0);
var windows = sharedSource.Window(() => closingSignal);

Serving multiple result Futures as soon as available to a client

I have a page that is populated by data that I get using different calls to distant servers. Some requests take longer than others, the way I do things now is that I do all the calls at once and wrap the whole thing in a Future, then put the the whole thing in a Action.async for Play to handle.
This, theoretically, does the job but I don't want my users to be waiting a long time and instead start loading the page part by part. Meaning that as soon as data is available for a given request to a distant server, it should be sent to the client as Json or whatever.
I was able to partially achieve this using EventSource by modifying Play's event-source sample by doing something like this:
Ok.chunked((enumerator1 &> EventSource()) >- (enumerator2 &> EventSource())).as("text/event-stream")
and the enumerators as follows:
val enumerator1: Enumerator[String] = Enumerator.generateM{
Future[Option[String]]{Thread.sleep(1500); Some("Hello")}
}
val enumerator2: Enumerator[String] = Enumerator.generateM{
Future[Option[String]]{Thread.sleep(2000); Some("World!")}
}
As you probably have guessed, I was expecting to have "Hello" after 1.5s and then "World!" 0.5s later sent to the client, but I ended up receiving "Hello" every 1.5s and "World!" every 2s.
My questions are:
Is there a way to stop sending an information once it has been correctly delivered to the client using the method above?
Is there a better way to achieve what I want?

You don't want generateM, it's for building enumerators that can return multiple values. generateM takes a function that either returns a Some, to produce the next value for the Enumerator, or None, to signal that the Enumerator is complete. Because your function always returns Some, you create Enumerators that are infinite in length.
You just want to convert a Future into an Enumerator, to create an Enumerator with a single element:
Enumerator.flatten(future.map(Enumerator(_)))
Also, you can interleave your enumerators and then feed the result into EventSource(). Parenthesis are unnecessary as well (methods that start with > have precedence over methods with &).
enumerator1 >- enumerator2 &> EventSource()