Why does `Publishers.Map` consume upstream values eagerly? - swift

Suppose I have a custom subscriber that requests one value on subscription and then an additional value three seconds after it receives the previous value:
class MySubscriber: Subscriber {
typealias Input = Int
typealias Failure = Never
private var subscription: Subscription?
func receive(subscription: Subscription) {
print("Subscribed")
self.subscription = subscription
subscription.request(.max(1))
}
func receive(_ input: Int) -> Subscribers.Demand {
print("Value: \(input)")
DispatchQueue.main.asyncAfter(deadline: .now() + .seconds(3)) {
self.subscription?.request(.max(1))
}
return .none
}
func receive(completion: Subscribers.Completion<Never>) {
print("Complete")
subscription = nil
}
}
If I use this to subscribe to an infinite range publisher, back pressure is handled gracefully, with the publisher waiting 3 seconds each time until it receives the next demand to send a value:
(1...).publisher.subscribe(MySubscriber())
// Prints values infinitely with ~3 seconds between each:
//
// Subscribed
// Value: 1
// Value: 2
// Value: 3
// ...
But if I add a map operator then MySubscriber never even receives a subscription; map appears to have synchronously requested Demand.Unlimited upon receiving its subscription and the app infinitely spins as map tries to exhaust the infinite range:
(1...).publisher
.map { value in
print("Map: \(value)")
return value * 2
}
.subscribe(MySubscriber())
// The `map` transform is executed infinitely with no delay:
//
// Map: 1
// Map: 2
// Map: 3
// ...
My question is, why does map behave this way? I would have expected map to just pass its downstream demand to the upstream. Since map is supposed to be for transformation rather than side effects, I don't understand what the use case is for its current behavior.
EDIT
I implemented a version of map to show how I think it ought to work:
extension Publishers {
struct MapLazily<Upstream: Publisher, Output>: Publisher {
typealias Failure = Upstream.Failure
let upstream: Upstream
let transform: (Upstream.Output) -> Output
init(upstream: Upstream, transform: #escaping (Upstream.Output) -> Output) {
self.upstream = upstream
self.transform = transform
}
public func receive<S: Subscriber>(subscriber: S) where S.Input == Output, S.Failure == Upstream.Failure {
let mapSubscriber = Subscribers.LazyMapSubscriber(downstream: subscriber, transform: transform)
upstream.receive(subscriber: mapSubscriber)
}
}
}
extension Subscribers {
class LazyMapSubscriber<Input, DownstreamSubscriber: Subscriber>: Subscriber {
let downstream: DownstreamSubscriber
let transform: (Input) -> DownstreamSubscriber.Input
init(downstream: DownstreamSubscriber, transform: #escaping (Input) -> DownstreamSubscriber.Input) {
self.downstream = downstream
self.transform = transform
}
func receive(subscription: Subscription) {
downstream.receive(subscription: subscription)
}
func receive(_ input: Input) -> Subscribers.Demand {
downstream.receive(transform(input))
}
func receive(completion: Subscribers.Completion<DownstreamSubscriber.Failure>) {
downstream.receive(completion: completion)
}
}
}
extension Publisher {
func mapLazily<Transformed>(transform: #escaping (Output) -> Transformed) -> AnyPublisher<Transformed, Failure> {
Publishers.MapLazily(upstream: self, transform: transform).eraseToAnyPublisher()
}
}
Using this operator, MySubscriber receives the subscription immediately and the mapLazily transform is only executed when there is demand:
(1...).publisher
.mapLazily { value in
print("Map: \(value)")
return value * 2
}
.subscribe(MySubscriber())
// Only transforms the values when they are demanded by the downstream subscriber every 3 seconds:
//
// Subscribed
// Map: 1
// Value: 2
// Map: 2
// Value: 4
// Map: 3
// Value: 6
// Map: 4
// Value: 8
My guess is that the particular overload of map defined for Publishers.Sequence is using some kind of shortcut to enhance performance. This breaks for infinite sequences, but even for finite sequences eagerly exhausting the sequence regardless of the downstream demand messes with my intuition. In my view, the following code:
(1...3).publisher
.map { value in
print("Map: \(value)")
return value * 2
}
.subscribe(MySubscriber())
ought to print:
Subscribed
Map: 1
Value: 2
Map: 2
Value: 4
Map: 3
Value: 6
Complete
but instead prints:
Map: 1
Map: 2
Map: 3
Subscribed
Value: 2
Value: 4
Value: 6
Complete

Here's a simpler test that doesn't involve any custom subscribers:
(1...).publisher
//.map { $0 }
.flatMap(maxPublishers: .max(1)) {
(i:Int) -> AnyPublisher<Int,Never> in
Just<Int>(i)
.delay(for: 3, scheduler: DispatchQueue.main)
.eraseToAnyPublisher()
}
.sink { print($0) }
.store(in: &storage)
It works as expected, but then if you uncomment the .map you get nothing, because the .map operator is accumulating the infinite upstream values without publishing anything.
On the basis of your hypothesis that map is somehow optimizing for a preceding sequence publisher, I tried this workaround:
(1...).publisher.eraseToAnyPublisher()
.map { $0 }
// ...
And sure enough, it fixed the problem! By hiding the sequence publisher from the map operator, we prevent the optimization.

Related

Falling Max Swift Combine Publisher

I'm working on a publisher for Swift/Combine
Given a stream of inputs, I want to record the max value.
If the next number is lower, take one from the last recorded max value and emit that.
Input: [1,2,3,4,5,2,3,3,1]
Output: [1,2,3,4,5,4,3,3,2]
I can do this easily with the following code, however, I really don't like the instance variable
var lastMaxInstanceValue: Float = 0
publisher
.map { newValue
if newValue > lastMaxInstanceValue {
lastMaxInstanceValue = newValue
} else {
lastMaxInstanceValue = max(0, lastMaxInstanceValue - 1)
}
}
.assign(to: \.percentage, on: self)
.store(in: &cancellables)
So I wrote a publisher/subscriber here which encapsulates the map part above:
https://github.com/nthState/FallingMaxPublisher
With my publisher, the code turns into:
publisher
.fallingMax()
.assign(to: \.percentage, on: self)
.store(in: &cancellables)
My question is, is my GitHub publisher necessary? Can the value I want be calculated without having the extra variable?
You can use scan operator to achieve this. Scan stores an accumulated value computed from each upstream value and the currently accumulated value. You do however, need to give it an initial value; based on your example I used 0:
publisher
.scan(0, { max, value in
value >= max ? value : max - 1
})
You can implement a fallingMax operator for SignedIntegers - as in your github - like so:
extension Publisher where Output: SignedInteger {
func fallingMax(initial: Output = 0,
fadeDownAmount: Output = 1
) -> AnyPublisher<Output, Failure> {
self.scan(initial, { max, value in
value >= max ? value : max - fadeDownAmount
})
.eraseToAnyPublisher()
}
}
As per #Rob's suggestion, if you don't want to supply an initial value, which would instead use the first value as the initial output, you can use an Optional (notice the .compactMap to bring it back to non-optional):
extension Publisher where Output: SignedInteger {
func fallingMax(initial: Output? = .none,
fadeDownAmount: Output = 1
) -> AnyPublisher<Output, Failure> {
self.scan(initial, { max, value in
max.map { value >= $0 ? value : $0 - fadeDownAmount } ?? value
})
.compactMap { $0 }
.eraseToAnyPublisher()
}
}

How to cancel an Asynchronous function in Swift

In swift, what is the common practice to cancel an aync execution?
Using this example, which execute the closure asynchronously,
what is the way to cancel the async function?
func getSumOf(array:[Int], handler: #escaping ((Int)->Void)) {
//step 2
var sum: Int = 0
for value in array {
sum += value
}
//step 3
Globals.delay(0.3, closure: {
handler(sum)
})
}
func doSomething() {
//setp 1
self.getSumOf(array: [16,756,442,6,23]) { [weak self](sum) in
print(sum)
//step 4, finishing the execution
}
}
//Here we are calling the closure with the delay of 0.3 seconds
//It will print the sumof all the passed numbers.
Unfortunately, there is no generalized answer to this question as it depends entirely upon your asynchronous implementation.
Let's imagine that your delay was the typical naive implementation:
static func delay(_ timeInterval: TimeInterval, closure: #escaping () -> Void) {
DispatchQueue.main.asyncAfter(deadline: .now() + timeInterval) {
closure()
}
}
That's not going to be cancelable.
However you can redefine it to use DispatchWorkItem. This is cancelable:
#discardableResult
static func delay(_ timeInterval: TimeInterval, closure: #escaping () -> Void) -> DispatchWorkItem {
let task = DispatchWorkItem {
closure()
}
DispatchQueue.main.asyncAfter(deadline: .now() + timeInterval, execute: task)
return task
}
By making it return a #discardableResult, that means that you can use it like before, but if you want to cancel it, grab the result and pass it along. E.g., you can define your asynchronous sum routine to use this pattern, too:
#discardableResult
func sum(of array: [Int], handler: #escaping (Int) -> Void) -> DispatchWorkItem {
let sum = array.reduce(0, +)
return Globals.delay(3) {
handler(sum)
}
}
Now, doSomething can, if it wants, capture the returned value and use it to cancel the asynchronously scheduled task:
func doSomething() {
var task = sum(of: [16, 756, 442, 6, 23]) { sum in
print(Date(), sum)
}
...
task.cancel()
}
You can also implement the delay with a Timer:
#discardableResult
static func delay(_ timeInterval: TimeInterval, closure: #escaping () -> Void) -> Timer {
Timer.scheduledTimer(withTimeInterval: timeInterval, repeats: false) { _ in
closure()
}
}
And
#discardableResult
func sum(of array: [Int], handler: #escaping (Int) -> Void) -> Timer {
let sum = array.reduce(0, +)
return Globals.delay(3) {
handler(sum)
}
}
But this time, you'd invalidate the timer:
func doSomething() {
weak var timer = sum(of: [16, 756, 442, 6, 23]) { sum in
print(Date(), sum)
}
...
timer?.invalidate()
}
It must be noted that the above scenarios are unique to simple “delay” scenarios. This is not a general purpose solution for stopping asynchronous processes. For example, if the asynchronous tasks consists of some time consuming for loop, the above is insufficient.
For example, let's say you are doing something really complicated calculation in a for loop (e.g. processing the pixels of an image, processing frames of a video, etc.). In that case, because there is no preemptive cancelation, you'd need to manually check to see if the DispatchWorkItem or the Operation has been canceled by checking their respective isCancelled properties.
For example, let's consider an operation to sum all primes less than 1 million:
class SumPrimes: Operation {
override func main() {
var sum = 0
for i in 1 ..< 1_000_000 {
if isPrime(i) {
sum += i
}
}
print(Date(), sum)
}
func isPrime(_ value: Int) -> Bool { ... } // this is slow
}
(Obviously, this isn't an efficient way to solve the “sum of primes less than x” problem, but it just an example for illustrative purposes.)
And
let queue = OperationQueue()
let operation = SumPrimes()
queue.addOperation(operation)
We're not going to be able to cancel that. Once it starts, there’s no stopping it.
But we can make it cancelable by adding a check for isCancelled in our loop:
class SumPrimes: Operation {
override func main() {
var sum = 0
for i in 1 ..< 1_000_000 {
if isCancelled { return }
if isPrime(i) {
sum += i
}
}
print(Date(), sum)
}
func isPrime(_ value: Int) -> Bool { ... }
}
And
let queue = OperationQueue()
let operation = SumPrimes()
queue.addOperation(operation)
...
operation.cancel()
Bottom line, if it’s something other than a simple delay, and you want it to be cancelable, you have to integrate this into your code that can be run asynchronously.
Using this example..., what is the way to cancel the async function?
Using that example, there is no such way. The only way to avoid printing the sum is for self to go out existence some time in the 0.3 seconds immediately after the call.
(There are ways to make a cancellable timer, but the timer you've made, assuming that it's the delay I think it is, is not cancellable.)
I don't know your algorithm but first I have suggestions for some points.
If you want to delay, do it outside of getSumOf function for adapt Single Responsibility.
Use built-in reduce function to sum items in array in better and more efficient way.
You can use DispatchWorkItem to build a cancellable task. So you can remove getSumOf function and edit doSomething function like below.
let yourArray = [16,756,442,6,23]
let workItem = DispatchWorkItem {
// Your async code goes in here
let sum = yourArray.reduce(0, +)
print(sum)
}
// Execute the work item after 0.3 second
DispatchQueue.main.asyncAfter(deadline: .now() + 0.3, execute: workItem)
// You can cancel the work item if you no longer need it
workItem.cancel()
You can also look into OperationQueue for advanced use.

RxSwift How to split progress and result observables?

I need to make a long async calculation based on a String input and return a big Data instance.
I use Single trait to achieve this:
func calculateData(from: String) -> Single<Data>
This example is simple and works. But I also need to track progress — a number between 0 and 1. I'm doing something like this:
func calculateData(from: String) -> Observable<(Float, Data?)>
where I get the following sequence:
next: (0, nil)
next: (0.25, nil)
next: (0.5, nil)
next: (0.75, nil)
next: (1, result data)
complete
I check for progress and data to understand if I have a result, it works, but I feel some strong smell here. I want to separate streams: Observable with progress and Single with a result. I know I can return a tuple or structure with two observables, but I don't like this as well.
How can I achieve this? Is it possible?
What you have is fine although I would name the elements in the tuple
func calculateData(from: String) -> Observable<(percent: Float, data: Data?)>
let result = calculateData(from: myString)
.share()
result
.map { $0.percent }
.subscribe(onNext: { print("percent complete:", $0) }
.disposed(by: disposeBag)
result
.compactMap { $0.data }
.subscribe(onNext: { print("completed data:", $0) }
.disposed(by: disposeBag)
Another option is to use an enum that either returns percent complete OR the data:
enum Progress {
case incomplete(Float)
case complete(Data)
}
func calculateData(from: String) -> Observable<Progress>
However, doing that would make it harder to break the Observable up into two streams. To do so, you would have to extend Progress like so:
extension Progress {
var percent: Float {
switch self {
case .incomplete(let percent):
return percent
case .complete:
return 1
}
}
var data: Data? {
switch self {
case .incomplete:
return nil
case .complete(let data):
return data
}
}
}
And as you see, doing the above essentially turns the enum into the tuple you are already using. The nice thing about this though is that you get a compile time guarantee that if Data emits, the progress will be 1.
If you want the best of both worlds, then use a struct:
struct Progress {
let percent: Float
let data: Data?
init(percent: Float) {
guard 0 <= percent && percent < 1 else { fatalError() }
self.percent = percent
self.data = nil
}
init(data: Data) {
self.percent = 1
self.data = data
}
}
func calculateData(from: String) -> Observable<Progress>
The above provides the compile time guarantee of the enum and the ease of splitting that you get with the tuple. It also provides a run-time guarantee that progress will be 0...1 and if it's 1, then data will exist.

"PassthroughSubject" seems to be thread-unsafe, is this a bug or limitation?

"PassthroughSubject" seems to be thread-unsafe. Please see the code below, I'm sending 100 values concurrently to a subscriber which only request .max(5). Subscriber should only get 5 values I think, but it actually got more. Is this a bug or limitation?
// Xcode11 beta2
var count = 0
let q = DispatchQueue(label: UUID().uuidString)
let g = DispatchGroup()
let subject = PassthroughSubject<Int, Never>()
let subscriber = AnySubscriber<Int, Never>(receiveSubscription: { (s) in
s.request(.max(5))
}, receiveValue: { v in
q.sync {
count += 1
}
return .none
}, receiveCompletion: { c in
})
subject.subscribe(subscriber)
for i in 0..<100 {
DispatchQueue.global().async(group: g) {
subject.send(i)
}
}
g.wait()
print("receive", count) // expected 5, but got more(7, 9...)
I believe the prefix operator can help:
/// Republishes elements up to the specified maximum count.
func prefix(Int) -> Publishers.Output<PassthroughSubject<Output, Failure>>
The max operator is returning the largest value at completion (and it's possible you're triggering completion more than once):
/// Publishes the maximum value received from the upstream publisher, after it finishes.
/// Available when Output conforms to Comparable.
func max() -> Publishers.Comparison<PassthroughSubject<Output, Failure>>

Assigning Observable to another

I have this object TryOut, when initialize, it executes a private method every 2 seconds. Within that method func execute() there is ,internalStream, a local variable of type Observable<Int> that captures data that I wish to emit to the outside world.
The issue is that even though internalStream is assigning to a member property public var outsideStream: Observable<Int>?, There aren't any events coming from subscribing to outsideStream. Why though ? is there any reason behind that ?
Working Case
The only way it work, is by having a closure as member property public var broadcast:((Observable<Int>) -> ())? = nil, and raise it within the execute method by doing this broadcast?(internalStream)
A sample code could be found in this gist. Thank you for your help.
For this kind of situation, when you want to produce events by yourself, you should better use any of *Subject provided by RxSwift.
For example:
Change outputStream declaration to:
public var outsideStream = PublishSubject<Int>()
Produce events in right way:
#objc private func execute() {
currentIndex += 1
if currentIndex < data.count {
outsideStream.onNext(data[currentIndex])
}
guard currentIndex + 1 > data.count && timer.isValid else { return }
outsideStream.onCompleted()
timer.invalidate()
}
And the usage:
let participant = TryOut()
participant.outsideStream
.subscribe(
onNext: { print("income index:", $0) },
onCompleted: { print("stream completed") }
)
.disposed(by: bag)
gives you the output:
income index: 1
income index: 2
income index: 3
income index: 4
income index: 5
stream completed
P.S. Also, there is another way to do that by using (or reproduce) retry method by RxSwiftExt library.