Combine`s subscribe(on:options:) operator - swift

I have a question about the subscribe(on:options:) operator. I would appreciate if anyone can help me to figure it out.
So what we have from the documentation:
Specifies the scheduler on which to perform subscribe, cancel, and request operations.
In contrast with receive(on:options:), which affects downstream messages, subscribe(on:options:) changes the execution context of upstream messages.
Also, what I got from different articles is that unless we explicitly specify the Scheduler to receive our downstream messages on (using receive(on:options:)), messages will be send on the Scheduler used for receiving a subscription.
This information is not aligned with what I am actually getting during the execution.
I have the next code:
Just("Some text")
.map { _ in
print("Map: \(Thread.isMainThread)")
}
.subscribe(on: DispatchQueue.global())
.sink { _ in
print("Sink: \(Thread.isMainThread)")
}
.store(in: &subscriptions)
I would expect next output:
Map: false
Sink: false
But instead I am getting:
Map: true
Sink: false
The same thing happens when I use Sequence publisher.
If I swap the position of map operator and subscribe operator, I receive what I want:
Just("Some text")
.subscribe(on: DispatchQueue.global())
.map { _ in
print("Map: \(Thread.isMainThread)")
}
.sink { _ in
print("Sink: \(Thread.isMainThread)")
}
.store(in: &subscriptions)
Output:
Map: false
Sink: false
Interesting fact is that when I use the same order of operators from my first listing with my custom publisher, I receive the behaviour I want:
struct TestJust<Output>: Publisher {
typealias Failure = Never
private let value: Output
init(_ output: Output) {
self.value = output
}
func receive<S>(subscriber: S) where S : Subscriber, Failure == S.Failure, Output == S.Input {
subscriber.receive(subscription: Subscriptions.empty)
_ = subscriber.receive(value)
subscriber.receive(completion: .finished)
}
}
TestJust("Some text")
.map { _ in
print("Map: \(Thread.isMainThread)")
}
.subscribe(on: DispatchQueue.global())
.sink { _ in
print("Sink: \(Thread.isMainThread)")
}
.store(in: &subscriptions)
Output:
Map: false
Sink: false
So I think there is either my total misunderstanding of all these mechanisms, or some publishers intentionally choose the thread to publish values (Just, Sequence -> Main, URLSession.DataTaskPublisher -> Some of Background), which does not make sense for me, cause in this case why would we need this subscribe(on:options:) for.
Could you please help me to understand what am I missing? Thank you in advance.

The first thing to understand is that messages flow both up a pipeline and down a pipeline. Messages that flow up a pipeline ("upstream") are:
The actual performance of the subscription (receive subscription)
Requests from a subscriber to the upstream publisher asking for a new value
Cancel messages (these percolate upwards from the final subscriber)
Messages that flow down a pipeline ("downstream") are:
Values
Completions, consisting of either a failure (error) or completion-in-good-order (reporting that the publisher emitted its last value)
Okay, well, as the documentation clearly states, subscribe(on:) is about the former: messages that flow upstream. But you are not actually tracking any of those messages in your tests, so none of your results reflect any information about them! Insert an appropriate handleEvents operator above the subscription point to see stuff flow upwards up the pipeline (e.g. implement its receiveRequest: parameter):
Just("Some text")
.handleEvents(receiveRequest: {
_ in print("Handle1: \(Thread.isMainThread)")
})
.map // etc.
Meanwhile, you should make no assumptions about the thread on which messages will flow downstream (i.e. values and completions). You say:
Also, what I got from different articles is that unless we explicitly specify the Scheduler to receive our downstream messages on (using receive(on:options:)), messages will be send on the Scheduler used for receiving a subscription.
But that seems like a bogus assumption. And nothing about your code determines the downstream-sending thread in a clear way. As you rightly say, you can take control of this with receive(on:), but if you don't, I would say you must assume nothing about the matter. Some publishers certainly do produce a value on a background thread, such as the data task publisher, which makes perfect sense (the same thing happens with a data task completion handler). Others don't.
What you can assume is that operators other than receive(on:) will not generally alter the value-passing thread. But whether and how an operator will use the subscription thread to determine the receive thread, that is something you should assume nothing about. To take control of the receive thread, take control of it! Call receive(on:) or assume nothing.
Just to give an example, if you change your opening to
Just("Some text")
.receive(on: DispatchQueue.main)
then both your map and your sink will report that they are receiving values on the main thread. Why? Because you took control of the receive thread. This works regardless of what you may say in any subscribe(on:) commands. They are different matters entirely.
Maybe if you call subscribe(on:) but you don't call receive(on:), some things about the downstream-sending thread are determined by the subscribe(on:) thread, but I sure wouldn't rely on there being any hard and fast rules about it; there's nothing saying that in the documentation! Instead, don't do that. If you implement subscribe(on:), implement receive(on:) too so that you are in charge of what happens.

Related

Executing 2 parallel network requests using Swift Combine

I am trying to load data from two different endpoints using two different publishers which have different return types. I need to update the UI when both requests complete, but both requests can also fail so Zip doesn't do the trick. Usually I would use a DispatchGroup to accomplish this, but I have not figured out how to do that using Combine. Is there a way to use DispatchGroup with Combine?
let dispatchGroup: DispatchGroup = .init()
let networkQueue: DispatchQueue = .init(label: "network", cos: .userInitiated)
dispatchGroup.notify { print("work all done!" }
publisher
.receive(on: networkQueue, options: .init(group: dispatchGroup)
.sink { ... }
.receiveValue { ... }
.store(in: &cancellables)
publisher2
.receive(on: networkQueue, options: .init(group: dispatchGroup)
.sink { ... }
.receiveValue { ... }
.store(in: &cancellables)
The notify is immediately executed. Is this not the right way of doing this?
You'll want to use the Publishers.CombineLatest which will take the two publishers and create a new publisher, with the result of the latest value from both streams:
Publishers.CombineLatest(publisher, publisher2)
// Receive values on the main queue (you decide whether you want to do this)
.receive(on: DispatchQueue.main)
.sink(receiveCompletion: { completion in
// Handle error / completion
// If either stream produces an error, the error will be forwarded in here
}, receiveValue: { value1, value2 in
// value1 will be the value of publisher's Output type
// value2 will be the value of pubslier2's Output type
})
// You only need to store this subscription - not publisher and publisher2 individually
.store(in: &cancellables)
The Publishers.CombineLatest publisher, is what can be seen as the equivalent of using a DispatchGroup, where you call dispatchGroup.enter() for each network operation you initiate. However, one key difference is that the CombineLatest publisher will produce more than one value, if any of the publishers produce more than one value. For normal network operations, you don't need to worry about this. But if you find yourself in a situation where you only need the first or the first N values produces by the combined publisher, you could use the prefix(_:) modifier, which will make sure that you will never receive more than N events.
EDIT: Updated to fix typo in code.

How to apply back pressure with Combine buffer operator to avoid flatMap to ask an infinite demand upstream?

I'm trying to use Combine to do several millions concurrent request through the network. Here is a mock up of the naive approach I'n using:
import Foundation
import Combine
let cancellable = (0..<1_000_000).publisher
.map(some_preprocessing)
.flatMap(maxPublishers: .max(32)) { request in
URLSession.dataTaskPublisher(for: request)
.map(\.data)
.catch { _ in
return Just(Data())
}
}
.sink { completion in
print(completion)
} receiveValue: { value in
print(value)
}
// Required in a command line tool
sleep(100)
This pipeline first creates a request, the the request is done in flatMap to confine errors. Also, flatMap merges several requests to they are effectively done concurrently, which is great.
The issue is that it will literally make 1,000,000 requests concurrently, so I added the parameter maxPublishers which limits the number of publishers that are subscribed at the same time in flatMap. This kind of work, only 32 publishers are active at the same time, but unfortunately some_preprocessing will still be performed 1,000,000 times before flatMap will be executed.
I expected flatMap(maxPublishers: .max(32)) to apply some back pressure, i.e. only requesting items from the upstream publisher map when maxPublishers < 32. This does not seem to be the case, and it fills up the RAM rapidly and delays the processing.
I then tried to use the buffer operator that is used to introduce back pressure between a producer and a consumer, but Apple documentation is so poor I don't understand its functioning (more specifically the prefechStrategy argument).
So I tried different combinations such as:
import Foundation
import Combine
let cancellable = (0..<1_000_000).publisher
.map(some_preprocessing)
.buffer(size: 32, prefetch: .byRequest, whenFull: .dropNewest)
.flatMap(maxPublishers: .max(32)) { request in
URLSession.dataTaskPublisher(for: request)
.map(\.data)
.catch { _ in
return Just(Data())
}
}
.sink { completion in
print(completion)
} receiveValue: { value in
print(value)
}
// Required in a command line tool
sleep(100)
This does not seem to do anything useful though, flatMap still requests as much element as it can.
How to properly apply back pressure in this case? I.e I need the upstream map publisher to "wait" for demand asked by the downstream publisher flatMap, which should only ask items when it as an empty slot.
The issue appears to be a Combine bug, as pointed out here. Using Publishers.Sequence causes the following operator to accumulate every value sent downstream before proceeding.
A workaround is to type-erase the sequence publisher:
import Foundation
import Combine
let cancellable = (0..<1_000_000).publisher
.eraseToAnyPublisher() // <----
.map(some_preprocessing)
.flatMap(maxPublishers: .max(32)) { request in
URLSession.dataTaskPublisher(for: request)
.map(\.data)
.catch { _ in
return Just(Data())
}
}
.sink { completion in
print(completion)
} receiveValue: { value in
print(value)
}
// Required in a command line tool without running loop
sleep(.max)

Swift Combine erase array of publishers into AnyCancellable

Is it possible to fire multiple requests which return a Publisher and be able to cancel them without sink?
I would like to combine the requests into a single cancellable reference or store each one if possible without sink (code below). Is this possible?
func fetchDetails(for contract: String) -> AnyPublisher<String, Error>
Fire Multiple requests and store
#State var cancellable: Set<AnyCancellable> = []
let promises = items.map {
self.fetchFeed.fetchDetails(for: $0.contract)
}
Publishers.MergeMany(promises)
.sink(receiveCompletion: { _ in }, receiveValue: { _ in }) // ** is this required?
.store(in: &cancellable)
It really depends on what fetchDetails does to create the publisher. Almost every publisher provided by Apple has no side effects until you subscribe to it. For example, the following publishers have no side effects until you subscribe to them:
NSObject.KeyValueObservingPublisher (returned by NSObject.publisher(for:options:)
NotificationCenter.Publisher (returned by NotificationCenter.publisher(for:object:)
Timer.TimerPublisher (returned by Timer.publishe(every:tolerance:on:in:options:)
URLSession.DataTaskPublisher (returned by URLSession.dataTaskPublisher(for:)
The synchronous publishers like Just, Empty, Fail, and Sequence.Publisher.
In fact, the only publisher that has side effects on creation, as far as I know, is Future, which runs its closure immediately on creation. This is why you'll often see the Deferred { Future { ... } } construction: to avoid immediate side effects.
So, if the publisher returned by fetchDetails behaves like most publishers, you must subscribe to it to make any side effects happen (like actually sending a request over the network).

With Combine, how to deallocate the Subscription after a network request

If you use Combine for network requests with URLSession, then you need to save the Subscription (aka, the AnyCancellable) - otherwise it gets immediately deallocated, which cancels the network request. Later, when the network response has been processed, you want to deallocate the subscription, because keeping it around would be a waste of memory.
Below is some code that does this. It's kind of awkward, and it may not even be correct. I can imagine a race condition where network request could start and complete on another thread before sub is set to the non-nil value.
Is there a nicer way to do this?
class SomeThing {
var subs = Set<AnyCancellable>()
func sendNetworkRequest() {
var request: URLRequest = ...
var sub: AnyCancellable? = nil
sub = URLSession.shared.dataTaskPublisher(for: request)
.map(\.data)
.decode(type: MyResponse.self, decoder: JSONDecoder())
.sink(
receiveCompletion: { completion in
self.subs.remove(sub!)
},
receiveValue: { response in ... }
}
subs.insert(sub!)
I call this situation a one-shot subscriber. The idea is that, because a data task publisher publishes only once, you know for a fact that it is safe to destroy the pipeline after you receive your single value and/or completion (error).
Here's a technique I like to use. First, here's the head of the pipeline:
let url = URL(string:"https://www.apeth.com/pep/manny.jpg")!
let pub : AnyPublisher<UIImage?,Never> =
URLSession.shared.dataTaskPublisher(for: url)
.map {$0.data}
.replaceError(with: Data())
.compactMap { UIImage(data:$0) }
.receive(on: DispatchQueue.main)
.eraseToAnyPublisher()
Now comes the interesting part. Watch closely:
var cancellable: AnyCancellable? // 1
cancellable = pub.sink(receiveCompletion: {_ in // 2
cancellable?.cancel() // 3
}) { image in
self.imageView.image = image
}
Do you see what I did there? Perhaps not, so I'll explain it:
First, I declare a local AnyCancellable variable; for reasons having to do with the rules of Swift syntax, this needs to be an Optional.
Then, I create my subscriber and set my AnyCancellable variable to that subscriber. Again, for reasons having to do with the rules of Swift syntax, my subscriber needs to be a Sink.
Finally, in the subscriber itself, I cancel the AnyCancellable when I receive the completion.
The cancellation in the third step actually does two things quite apart from calling cancel() — things having to do with memory management:
By referring to cancellable inside the asynchronous completion function of the Sink, I keep cancellable and the whole pipeline alive long enough for a value to arrive from the subscriber.
By cancelling cancellable, I permit the pipeline to go out of existence and prevent a retain cycle that would cause the surrounding view controller to leak.
Below is some code that does this. It's kind of awkward, and it may not even be correct. I can imagine a race condition where network request could start and complete on another thread before sub is set to the non-nil value.
Danger! Swift.Set is not thread safe. If you want to access a Set from two different threads, it is up to you to serialize the accesses so they don't overlap.
What is possible in general (although not perhaps with URLSession.DataTaskPublisher) is that a publisher emits its signals synchronously, before the sink operator even returns. This is how Just, Result.Publisher, Publishers.Sequence, and others behave. So those produce the problem you're describing, without involving thread safety.
Now, how to solve the problem? If you don't think you want to actually be able to cancel the subscription, then you can avoid creating an AnyCancellable at all by using Subscribers.Sink instead of the sink operator:
URLSession.shared.dataTaskPublisher(for: request)
.map(\.data)
.decode(type: MyResponse.self, decoder: JSONDecoder())
.subscribe(Subscribers.Sink(
receiveCompletion: { completion in ... },
receiveValue: { response in ... }
))
Combine will clean up the subscription and the subscriber after the subscription completes (with either .finished or .failure).
But what if you do want to be able to cancel the subscription? Maybe sometimes your SomeThing gets destroyed before the subscription is complete, and you don't need the subscription to complete in that case. Then you do want to create an AnyCancellable and store it in an instance property, so that it gets cancelled when SomeThing is destroyed.
In that case, set a flag indicating that the sink won the race, and check the flag before storing the AnyCancellable.
var sub: AnyCancellable? = nil
var isComplete = false
sub = URLSession.shared.dataTaskPublisher(for: request)
.map(\.data)
.decode(type: MyResponse.self, decoder: JSONDecoder())
// This ensures thread safety, if the subscription is also created
// on DispatchQueue.main.
.receive(on: DispatchQueue.main)
.sink(
receiveCompletion: { [weak self] completion in
isComplete = true
if let theSub = sub {
self?.subs.remove(theSub)
}
},
receiveValue: { response in ... }
}
if !isComplete {
subs.insert(sub!)
}
combine publishers have an instance method called prefix which does this:
func prefix(_ maxLength: Int) -> Publishers.Output<Self>
https://developer.apple.com/documentation/combine/publisher/prefix(_:)
playground example

How to drop new elements if an observer is busy?

I have an observable which regularly emits elements. On those elements, I perform one fast and one slow operation. What I want is to drop new elements for slow observer while it is busy. Is there any way to achieve this with Rx instead of keeping a flag in slow operation?
I am very new at Reactive extensions, please correct me if anything is wrong with my assumptions.
let tick = Observable<Int>.interval(.seconds(1),
scheduler: SerialDispatchQueueScheduler(qos: .background)).share()
tick.subscribe {
print("fast observer \($0)")
}.disposed(by: disposeBag)
// observing in another queue so that it does not block the source
tick.observeOn(SerialDispatchQueueScheduler(qos: .background))
.subscribe {
print("slow observer \($0)")
sleep(3) // cpu-intensive task
}.disposed(by: disposeBag)
For this, flatMap is your friend. Whenever you want to drop events (either the current one when a new one comes in, or subsequent ones while working on the current one) use flatMap. More information can be found in my article: RxSwift’s Many Faces of FlatMap
Here you go:
let tick = Observable<Int>.interval(.seconds(1), scheduler: MainScheduler.instance).share()
func cpuLongRunningTask(_ input: Int) -> Observable<Int> {
return Observable.create { observer in
print("start task")
sleep(3)
print("finish task")
observer.onNext(input)
observer.onCompleted()
return Disposables.create { /* cancel the task if possible */ }
}
}
tick
.subscribe {
print("fast \($0)")
}
.disposed(by: disposeBag)
tick
.flatMapFirst {
// subscribing in another scheduler so that it does not block the source
cpuLongRunningTask($0)
.subscribeOn(SerialDispatchQueueScheduler(qos: .background))
}
.observeOn(MainScheduler.instance) // make sure the print happens on the main thread
.subscribe {
print("slow \($0)")
}
.disposed(by: disposeBag)
Sample output as follows:
fast next(0)
start task
fast next(1)
fast next(2)
fast next(3)
finish task
slow next(0)
fast next(4)
start task
fast next(5)
fast next(6)
fast next(7)
finish task
slow next(4) <-- slow ignored the 1, 2, and 3 values.
I'm afraid there is not a straightforward solution. The issue you describe is related to backpressure and unfortunately, RxSwift does not provide support for it (Apple Combine does). Usually, you will have to handle this situation manually by using one of the filtering operators: debounce, throttle or filter.
By using debounce or throttle you would need to know the exact duration of the operation which probably is not always the case.
By using filter, as you said, you could check for a flag you set before starting the long-running operation.