Buffering slow subscribers in Swift combine - swift

I'm currently struggling to get a desired behaviour when using Combine. I've previously used RX framework and believe (from what I remember) that the described scenario is possible by specifying backpressure strategies for buffering.
So the issue I have is that I have a publisher that publishes values very rapidly, I have two subscribers to it, one which can react just as fast as the values are published (cool beans), but then a second subscriber that runs some CPU expensive processing.
I know in order to support the second slower subscriber that I need to afford buffering of values, but don't seem to be be able to make this happen, here is what I have so far:
let subject = PassthroughSubject<Int, Never>()
// publish some values
Task {
for i in 0... {
subject.send(i)
}
}
subject
.print("fast")
.sink { _ in }
subject
.map { n -> Int in
sleep(1) // CPU intensive work here
return n
}
.print("slow")
.sink { _ in }
Originally I thought I could use .buffer(..) on the slow subscriber but this doesn't appear to be the use case, what seems to happen is that the subject dispatches to each subscriber and only after the subscriber finishes, does it then demand more from the publisher, and in this case that seems to block the .send(..) call of the publishing loop.
Any advice would be greatly appreciated 👍

Related

RxSwift - How does merge handle simultaneous events?

So I have a code that looks something like this:
let a = Observable.just("A").delay(.seconds(3), scheduler: MainScheduler.instance)
let b = Observable.just("B").delay(.seconds(3), scheduler: MainScheduler.instance)
Observable.merge([a, b]).subscribe(onNext: { print($0) })
I thought that printed order should be random, as the delay that finishes marginally earlier will be emitted first by merge. But the order seems to be determined solely by the order of variables in array passed to merge.
That means .merge([a, b]) prints
A
B
but .merge([b, a]) prints
B
A
Why is this the case? Does it mean that merge somehow handles the concurrent events and I can rely on the fact that the events with same delay will always be emitted in their order in array?
I know I could easily solve it by using .from(["A", "B"]), but I am now curious to know how exactly does the merge work.
You put both delays on the same thread (and I suspect the subscriptions are also happening on that thread,) so they will execute on the same thread in the order they are subscribed to. Since the current implementation of merge subscribes to the observables in the order it receives them, you get the behavior you are seeing. However, it's important to note that you cannot rely on this, there is no guarantee that it will be true for every version of the library.

Swift Combine: One publisher consumes another, how to get both streams back out

I'd like some help understanding why my publishers aren't emitting elements through the combineLatest operator. I have a publisher that emits video frames, and another publisher that consumes these video frames and extracts faces from these frames. I'm now trying to combine the original video frames and the transformed output into one using combineLatest (I am using some custom publishers to extract video frames and transform the frames):
let videoPublisher = VideoPublisher //Custom Publisher that outputs CVImageBuffers
.share()
let faceDetectionPublisher = videoPublisher
.detectFaces() // Custom Publisher/subscriber that takes in video frames and outputs an array of VNFaceObservations
let featurePublisher = videoPublisher.combineLatest(faceDetectionPublisher)
.sink(receiveCompletion:{_ in
print("done")
}, receiveValue: { (video, faces) in
print("video", video)
print("faces", faces)
})
I'm not getting any activity out of combineLatest, however. After some debugging, I think the issue is that all the videoFrames from videoPublisher are published before any can successfully flow through faceDetectionPublisher. If I attach print statements to the end of the videoPublisher and faceDetectionPublisher, I can see output from the former but none from the latter. I've read up on combine and other techniques such as multicasting, but haven't figured out a working solution. I'd love any combine expertise or guidance on how to better understand the framework!
Your combineLatest won't emit anything until each of its sources emit at least one value. Since detectFaces() never emits, your chain is stalling. Something is wrong in your detectFaces() operator or maybe there are no faces to detects, in which case your logic is off.
If the latter case, then use prepend on the result of detectFaces() to seed the pipeline with some default value (maybe an empty array?)

How to consume infinite stream in Kotlin?

I want to write a program that will consume the infinite stream from the web. The stream will come in as JSON through web sockets. I'm looking for the data structured that I should use.
Requirements are:
The stream will be infinite. I don't ever want to stop listening for the new data.
The time between new events in the stream is unknown. They can come rapidly one after another, but also it's possible to have big pauses of several hours. I want to consume when there is something, and wait in the meantime.
I want to consume events sequentally, one after another.
My component should only transform the consumed events, and forward them further. I tried with something like that:
fun consume(stream: Stream<WebEvent>): Sequence<TransformedEvent> {
return try {
stream.asSequence().let { seq ->
var currentEvent = generator.firstEvent(seq.first())
seq.map {
currentEvent = generator.nextEvent(currentEvent, it)
return#map currentEvent
}
}
} catch (e: NoSuchElementException) {
throw EmptyStreamException(e)
}
}
The generator in this example "needs" the "previous" event to generate the new one, but that's part of the transformation logic. I'm interested in consuming the stream.
This worked. But I'm wondering is there a better way to do it in Kotlin. Maybe with a blocking queue, or something like that.

How to implement scheduler inheritance when another source tees into a pipeline?

I'd like to implement "scheduler inheritance" as part of an RxJava2-using API. I want consumers of my API to be able to think in terms of building a single processing chain rather than a DAG, even though, internally, new events are being teed in as an implementation detail.
I don't see any way to do the equivalent of:
observable
.flatMap {
val scheduler = Schedulers().current!!
someOtherObservable
.observeOn(scheduler)
}
Is there some other way to inherit a scheduler?
More Context
I have a pipeline like:
compositeDisposable += Environment
.lookupDeviceInfo()
.subscribeOn(scheduler)
.flatMap { deviceInfo ->
Device(deviceId = deviceInfo.id)
.sendCommand()
.subscribe(
{ result -> /*process result*/ },
{ e -> /*log error*/ })
To the consumer, this looks like they pushed all the work onto the specified scheduler: events from lookupDeviceInfo() get vectored to a worker from that scheduler, and they expect to stick on that worker.
In practice, they have a bug, because sendCommand() tees in events from another event source as an implementation detail:
sendMessageSingle(deviceId, payload)
.flatMap { sentMessageId ->
responseObservable
.filter { it.messageId == sentMessageId }
.firstOrError()
}
Events stream in from responseObservable, but none of those events get vectored to the specified scheduler, because that got applied upstream.
From the comments:
Returning to the same scheduler thread requires you to provide a single-threaded scheduler (i.e., Schedulers.from(Executor), Schedulers.single(), etc.). There is no current scheduler because there is no guarantee some code will run on any of the standard schedulers; they could be executing on arbitrary threads of the system, other frameworks, etc. Thus, you have to route the signals back to the desired thread via observeOn.
I'm not concerned about landing on the same thread, just the same scheduler. (Even changing Workers may be fine, so long as the new worker is vended by the same scheduler as the old.)
Then the suggestion still applies and you can forego the "single-threaded" property I mentioned.

Message processing throttling/backpressure

I have the source of the messages, which is an Observable. For every message I would like to make an HTTP call which will produce another Observable, so I combine them together with the flatMap and then sink them to some subscriber. Here the code of this scenario:
Rx.Observable.interval(1000)
.flatMap (tick) ->
// returns an `Observable`
loadMessages()
.flatMap (message) ->
// also returns and `Observable`
makeHttpRequest(message)
.subscribe (result) ->
console.info "Processed: ", result
this example is written in coffeescript, but I think the problem statement would be valid for any other Rx implementation.
The issue I have with this approach is that loadMessages produces a lot of messages very quickly. This means, that I make a lot of HTTP requests in a very short period of time. This is not acceptable in my situation, so I would like to limit amount of the parallel HTTP requests to 10 or so. In other words I would like to throttle the pipelene or apply some kind of backpresure, when I making HTTP requests.
Is there any standard approach or best practices for the Rx to deal with this kind of situations?
Currently I implemented very simple (and pretty suboptimal) backpresure mechanism, that ignores tick if system has too many massages in processing. It looks like this (simplified version):
Rx.Observable.interval(1000)
.filter (tick) ->
stats.applyBackpressureBasedOnTheMessagesInProcessing()
.do (tick) ->
stats.messageIn()
.flatMap (tick) ->
// returns an `Observable`
loadMessages()
.flatMap (message) ->
// also returns and `Observable`
makeHttpRequest(message)
.do (tick) ->
stats.messageOut()
.subscribe (result) ->
console.info "Processed: ", result
I'm not sure though, whether this can be done better, or maybe Rx already has some mechanisms in-place to deal with this kind of requirements.
This isn't strictly backpressure, this is just limiting concurrency. Here's an easy way to do it (ignore my possibly wrong syntax, coding via TextArea):
Rx.Observable.interval(1000)
.flatMap (tick) ->
// returns an `Observable`
loadMessages()
.map (message) ->
// also returns and `Observable`, but only when
// someone first subscribes to it
Rx.Observable.defer ->
makeHttpRequest(message)
.merge 10 // at a time
.subscribe (result) ->
console.info "Processed: ", result
In C#, the equivalent idea is, instead of SelectMany, it's Select(Defer(x)).Merge(n). Merge(int) subscribes to at most n in-flight Observables, and buffers the rest until later. The reason we have a Defer, is to make it such that we don't do any work until the Merge(n) subscribes to us.
In RXJS you can use the backpressure submodule
http://rxjs.codeplex.com/SourceControl/latest#src/core/backpressure/
disclaimer I have never used the RX version of JS but you did ask for a standard way of implementing backpressure and the core library seems to have support for it. RX c# does not yet have this support. Not sure why.
It sounds like you want to pull from a queue rather than push your http requests. Is Rx really the right choice of technology here?
EDIT:
In general, I would not design a solution using Rx where I had complete imperative control over the source events. It's just not a reactive scenario.
The backpressure module in Rxjs is clearly written to deal with situations where you don't own the source stream. Here you do.
TPL Dataflow sounds like a far better fit here.
If you must use RX, you could set up a loop like this: If you want to limit to X concurrent events, set up a Subject to act as your message source and imperatively push (OnNext) X messages into it. In your subscriber, you can push a new message to the subject in each iteration of the OnNext handler until the source is exhausted. This guarantees a maximum of X messages in flight.