Rxjava Scheduler.trampoline versus concatmap - rx-java2

It seems based on documentation that Scheduler.trampoline assures that elements emit first-in first-out (ie in order). it also seems that the point of concat map is to assure that everything is lined up appropriately and then emitted. So I was wondering if there was every a point in applying subscribeOn./.observeOn(Scheduler.trampoline()) and then doing afterwards concatmap operator as opposed to a regular mapping operating.

Yes, there's a point. Take this example:
Observable.just(1, 2, 3, 4, 5)
.subscribeOn(Schedulers.trampoline())
.flatMap(
a -> {
if (a < 3) {
return Observable.just(a).delay(3, TimeUnit.SECONDS);
} else {
return Observable.just(a);
}
})
.doOnNext(
a -> System.out.println("Element: " + a + ", on: " + Thread.currentThread().getName()))
.subscribe();
Here's the output:
Element: 3, on: main
Element: 4, on: main
Element: 5, on: main
Element: 1, on: RxComputationScheduler-1
Element: 2, on: RxComputationScheduler-2
What's happening here is 1 and 2 reach the flatMap operator in sequence. But now, the inner streams for these elements is delayed by 3 seconds. Note that flatMap eagerly subscribes to the inner streams. That is, it does not wait for one stream to finish (with onComplete) before subscribing to the next inner stream (which is what concatMap does).
So the inner streams of 1 and 2 are delayed by 3 seconds. You can say that this is an external I/O call that is taking a bit of time. Meanwhile, the next 3 elements (3,4,5) enter the flatMap and their streams finish immediately. That's why you see the sequence maintained in the output.
Then 3 seconds get over and elements 1 and 2 are emitted. Note that there's no guarantee that 1 would come before 2.
Now replace flatMap with concatMap and you'll see that the sequence is maintained:
Element: 1, on: RxComputationScheduler-1
Element: 2, on: RxComputationScheduler-2
Element: 3, on: RxComputationScheduler-2
Element: 4, on: RxComputationScheduler-2
Element: 5, on: RxComputationScheduler-2
Why? Because that's how concatMap works. Element 1 comes, and is used in an I/O call. It'll take 3 seconds before the inner stream corresponding to its inner stream emits an onComplete. The inner stream corresponding the remaining elements are not subscribed to by concatMap until the first stream emits an onComplete. As soon as it does, the next stream (Observable.just(2).delay(3, TimeUnit.SECONDS)) is subscribed to, and so on. So you can see how the order is maintained.
The thing you need to remember about these two operators is: flatMap eagerly subscribes to the inner streams, as and when the elements arrive. On the other hand, concatMap waits for one stream to finish before it subscribes to the next one. That's why you can't make parallel calls with concatMap.

Not really. trampoline essentially executes work on one of the threads calling its Worker.schedule method in a FIFO order.
In case of Observable.subscribeOn(Schedulers.trampoline()), it will be the thread subscribing so there is no practical effect applying it.
In case of Observable.observeOn(Schedulers.trampoline()), it will be the thread signaling items so there is no practical effect there either.
concatMap executes the mapper function either on the thread signaling the upstream items or the thread the inner Observable completes. The operator essentially has a built-in trampolining already so upstream items and downstream completion is not overlapping. In 3.x, there will be an overload taking a Scheduler, for which Schedulers.trampoline() would have no practical effect either.
The best use case for Schedulers.trampoline() is in unit tests where you don't need asynchrony. Therefore, you either parameterize your subscribeOn/observeOn usages or use a scheduler hook and replace the standard schedulers:
RxJavaPlugins.setComputationSchedulerHandler(s -> Schedulers.trampoline());
RxJavaPlugins.setIoSchedulerHandler(s -> Schedulers.trampoline());
RxJavaPlugins.setNewThreadSchedulerHandler(s -> Schedulers.trampoline());
then once you are done,
RxJavaPlugins.reset();

Related

RxSwift - How does merge handle simultaneous events?

So I have a code that looks something like this:
let a = Observable.just("A").delay(.seconds(3), scheduler: MainScheduler.instance)
let b = Observable.just("B").delay(.seconds(3), scheduler: MainScheduler.instance)
Observable.merge([a, b]).subscribe(onNext: { print($0) })
I thought that printed order should be random, as the delay that finishes marginally earlier will be emitted first by merge. But the order seems to be determined solely by the order of variables in array passed to merge.
That means .merge([a, b]) prints
A
B
but .merge([b, a]) prints
B
A
Why is this the case? Does it mean that merge somehow handles the concurrent events and I can rely on the fact that the events with same delay will always be emitted in their order in array?
I know I could easily solve it by using .from(["A", "B"]), but I am now curious to know how exactly does the merge work.
You put both delays on the same thread (and I suspect the subscriptions are also happening on that thread,) so they will execute on the same thread in the order they are subscribed to. Since the current implementation of merge subscribes to the observables in the order it receives them, you get the behavior you are seeing. However, it's important to note that you cannot rely on this, there is no guarantee that it will be true for every version of the library.

An equivalent of Single.merge, which runs each Single sequentially

The Single.merge documentation says:
Merges an Iterable sequence of SingleSource instances into a single
Flowable sequence, running all SingleSources at once.
Is there a similar operator which creates a Flowable, which does not runs all SingleSources at once but, instead, runs them sequentially - each one after the previous one completes?
I've found a solution:
val singles: List<Single<String>> = // the list of Single
Flowable
.fromIterable(singles)
.flatMapSingle({ it }, false, /* maxConcurrency */ 1)

Tail recursion when loading lot of items

I need to load a lot of small files from an api that allows me to load only one file at a time. As they are very small I start several downloads at a time. Depending on the result I start the next batch load.
For each request I use a observable and then combine several with combineLatest. After combineLatest I do a flatMap and concat a new call to the same function.
As abstraction I do this - pseudo code, not compiling:
func loadRecursively(items) -> Observable<XY> {
combineLatest(requestObservables)
.flatMap {
return loadRecursively(items-loadedItems)
}
}
This works perfectly in general.
The problem: This leads to a growing recursive tail, which is not cut off by compiler optimisation as it seems. So when loading some thousand files the stack will grow and finally the app will close.
How would I avoid the growing tail? Or in general how would I approach this problem with rx?
RxSwift has concatMap operator (because people had been faced with same problem), that allows you to sequentially loop through your Observables.
Simple example:
Observable.from([1, 2, 3, 4])
.concatMap(Observable.just)
.subscribe(onNext: {
print($0)
})
.disposed(by: bag)
Prints:
1
2
3
4

Rx Observable Window with closing function with parameter

I'm trying to separate observable into windows (or for my purposes also Buffers are fine) while being able to close windows/buffers at custom location.
E.g. I have an observable which produces integers starting at 1 and moving up. I want to close a window at each number which is divisible by 7. My closing function would need to take in the item as parameter in that case.
There is an overload of Window method:
Window<TSource, TWindowClosing>(IObservable<TSource>, Func<IObservable<TWindowClosing>>)
Either it cant be done using this overload, or I can't wrap my head around it. Documentation describes that it does exactly what I want but does not show an example. Also, it shows an example of non-deterministic closing, which depends on timing when closing observable collection emits items.
The Window operator breaks up an observable sequence into consecutive
non-overlapping windows. The end of the current window and start of
the next window is controlled by an observable sequence which is the
result of the windowClosingSelect function which is passed as an input
parameter to the operator. The operator could be used to group a set
of events into a window. For example, states of a transaction could be
the main sequence being observed. Those states could include:
Preparing, Prepared, Active, and Committed/Aborted. The main sequence
could include all of those states are they occur in that order. The
windowClosingSelect function could return an observable sequence that
only produces a value on the Committed or Abort states. This would
close the window that represented transaction events for a particular
transaction.
I'm thinking something like following would do the job, but I'd have to implement it myself:
Window<TSource, TWindowClosing>(IObservable<TSource>, Func<TSource, bool>)
Is such windowing possible with built-in functions (I know I can build one myself)?
Is it possible to close a window based on emitted item or only non-deterministically, once an item is emitted from windowing observable?
Use the original sequence with a Where clause as your closing sequence. If your source sequence is cold, then make use of Publish and RefCount to make it work correctly.
var source = ...;
var sharedSource = source.Publish().RefCount();
var closingSignal = sharedSource.Where(i => (i % 7) == 0);
var windows = sharedSource.Window(() => closingSignal);

How to buffer based on time and count, but stopping the timer if no events occur

I'm producing a sequence of 50 items each tree seconds. I then want to batch them at max 20 items, but also not waiting more than one second before I release the buffer.
That works great!
But since the interval never dies, Buffer keeps firing empty batch chunks...
How can I avoid that? Shure Where(buf => buf.Count > 0)should help - but that seems like a hack.
Observable
.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge()
.Buffer(TimeSpan.FromSeconds(1), 20)
.Subscribe(e => Console.WriteLine(e.Count));
Output:
0-0-0-20-20-10-0-20-20-10-0-0-20-20
The Where filter you propose is a sound approach, I'd go with that.
You could wrap the Buffer and Where into a single helper method named to make the intent clearer perhaps, but rest assured the Where clause is idiomatic Rx in this scenario.
Think of it this way; an empty Buffer is relaying information that no events occurred in the last second. While you can argue that this is implicit, it would require extra work to detect this if Buffer didn't emit an empty list. It just so happens it's not information you are interested in - so Where is an appropriate way to filter this information out.
A lazy timer solution
Following from your comment ("...the timer... be[ing] lazily initiated...") you can do this to create a lazy timer and omit the zero counts:
var source = Observable.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge();
var xs = source.Publish(pub =>
pub.Buffer(() => pub.Take(1).Delay(TimeSpan.FromSeconds(1))
.Merge(pub.Skip(19)).Take(1)));
xs.Subscribe(x => Console.WriteLine(x.Count));
Explanation
Publishing
This query requires subscribing to the source events multiple times. To avoid unexpected side-effects, we use Publish to give us pub which is a stream that multicasts the source creating just a single subscription to it. This replaces the older Publish().RefCount() technique that achieved the same end, effectively giving us a "hot" version of the source stream.
In this case, this is necessary to ensure the subsequent buffer closing streams produced after the first will start with the current events - if the source was cold they would start over each time. I wrote a bit about publishing here.
The main query
We use an overload of Buffer that accepts a factory function that is called for every buffer emitted to obtain an observable stream whose first event is a signal to terminate the current buffer.
In this case, we want to terminate the buffer when either the first event into the buffer has been there for a full second, or when 20 events have appeared from the source - whichever comes first.
To achieve this we Merge streams that describe each case - the Take(1).Delay(...) combo describes the first condition, and the Skip(19).Take(1) describes the second.
However, I would still test performance the easy way, because I still suspect this is overkill, but a lot depends on the precise details of the platform and scenario etc.
After using the accepted answer for quite a while I would now suggest a different implementation (inspired by James Skip / Take approach and this answer):
var source = Observable.Interval(TimeSpan.FromSeconds(3))
.Select(n => Observable.Repeat(n, 50))
.Merge();
var xs = source.BufferOmitEmpty(TimeSpan.FromSeconds(1), 20);
xs.Subscribe(x => Console.WriteLine(x.Count));
With an extension method BufferOmitEmpty like:
public static IObservable<IList<TSource>> BufferOmitEmpty<TSource>(this IObservable<TSource> observable, TimeSpan maxDelay, int maxBufferCount)
{
return observable
.GroupByUntil(x => 1, g => Observable.Timer(maxDelay).Merge(g.Skip(maxBufferCount - 1).Take(1).Select(x => 1L)))
.Select(x => x.ToArray())
.Switch();
}
It is 'lazy', because no groups are created as long as there are no elements on the source sequence, so there are no empty buffers. As in Toms answer there is an other nice advantage to the Buffer / Where implementation, that is the buffer is started when the first element arrives. So elements following each other within buffer time after a quiet period are processed in the same buffer.
Why not to use the Buffer method
Three problems occured when I was using the Buffer approach (they might be irrelevant for the scope of the question, so this is a warning to people who use stack overflow answers in different contexts like me):
Because of the Delay one thread is used per subscriber.
In scenarios with long running subscribers elements from the source sequence can be lost.
With multiple subscribers it sometimes creates buffers with count greater than maxBufferCount.
(I can supply sample code for 2. and 3. but I'm insecure whether to post it here or in a different question because I cannot fully explain why it behaves this way)
RxJs5 has hidden features buried into their source code. It turns out it's pretty easy to achieve with bufferTime
From the source code, the signature looks like this:
export function bufferTime<T>(this: Observable<T>, bufferTimeSpan: number, bufferCreationInterval: number, maxBufferSize: number, scheduler?: IScheduler): Observable<T[]>;
So your code would be like this:
observable.bufferTime(1000, null, 20)