Apple Combine framework: How to execute multiple Publishers in parallel and wait for all of them to finish? - swift

I am discovering Combine. I wrote methods that make HTTP requests in a "combine" way, for example:
func testRawDataTaskPublisher(for url: URL) -> AnyPublisher<Data, Error> {
var request = URLRequest(url: url,
cachePolicy: .useProtocolCachePolicy,
timeoutInterval: 15)
request.httpMethod = "GET"
return urlSession.dataTaskPublisher(for: request)
.tryMap {
return $0.data
}
.eraseToAnyPublisher()
}
I would like to call the method multiple times and do a task after all, for example:
let myURLs: [URL] = ...
for url in myURLs {
let cancellable = testRawDataTaskPublisher(for: url)
.sink(receiveCompletion: { _ in }) { data in
// save the data...
}
}
The code above won't work because I have to store the cancellable in a variable that belongs to the class.
The first question is: is it a good idea to store many (for example 1000) cancellables in something like Set<AnyCancellable>??? Won't it cause memory leaks?
var cancellables = Set<AnyCancellable>()
...
let cancellable = ...
cancellables.insert(cancellable) // ???
And the second question is: how to start a task when all the cancellables are finished? I was thinking about something like that
class Test {
var cancellables = Set<AnyCancellable>()
func run() {
// show a loader
let cancellable = runDownloads()
.receive(on: RunLoop.main)
.sink(receiveCompletion: { _ in }) { _ in
// hide the loader
}
cancellables.insert(cancellable)
}
func runDownloads() -> AnyPublisher<Bool, Error> {
let myURLs: [URL] = ...
return Future<Bool, Error> { promise in
let numberOfURLs = myURLS.count
var numberOfFinishedTasks = 0
for url in myURLs {
let cancellable = testRawDataTaskPublisher(for: url)
.sink(receiveCompletion: { _ in }) { data in
// save the data...
numberOfFinishedTasks += 1
if numberOfFinishedTasks >= numberOfURLs {
promise(.success(true))
}
}
cancellables.insert(cancellable)
}
}.eraseToAnyPublisher()
}
func testRawDataTaskPublisher(for url: URL) -> AnyPublisher<Data, Error> {
...
}
}
Normally I would use DispatchGroup, start multiple HTTP tasks and consume the notification when the tasks are finished, but I am wondering how to write that in a modern way using Combine.

You can run some operations in parallel by creating a collection of publishers, applying the flatMap operator and then collect to wait for all of the publishers to complete before continuing. Here's an example that you can run in a playground:
import Combine
import Foundation
func delayedPublisher<Value>(_ value: Value, delay after: Double) -> AnyPublisher<Value, Never> {
let p = PassthroughSubject<Value, Never>()
DispatchQueue.main.asyncAfter(deadline: .now() + after) {
p.send(value)
p.send(completion: .finished)
}
return p.eraseToAnyPublisher()
}
let myPublishers = [1,2,3]
.map{ delayedPublisher($0, delay: 1 / Double($0)).print("\($0)").eraseToAnyPublisher() }
let cancel = myPublishers
.publisher
.flatMap { $0 }
.collect()
.sink { result in
print("result:", result)
}
Here is the output:
1: receive subscription: (PassthroughSubject)
1: request unlimited
2: receive subscription: (PassthroughSubject)
2: request unlimited
3: receive subscription: (PassthroughSubject)
3: request unlimited
3: receive value: (3)
3: receive finished
2: receive value: (2)
2: receive finished
1: receive value: (1)
1: receive finished
result: [3, 2, 1]
Notice that the publishers are all immediately started (in their original order).
The 1 / $0 delay causes the first publisher to take the longest to complete. Notice the order of the values at the end. Since the first took the longest to complete, it is the last item.

Related

Subsequent ordered HTTP calls

I'm building a simple iOS client for HackerNews. I'm using their APIs, according to which I'll be able to get the ordered post IDs (sorted by new, best and top) and a single post item passing the ID to the request. The problem I'm facing is the following: how can I, once I get the IDs array, make an HTTP call for every post in an ordered fashion? With the way I currently implemented it, I'm not having any luck.
E.g. say the IDs array is [3001, 3002, 3003, 3004]. I tried calling the method to get those posts inside a for loop issuing dispatch groups and dispatch semaphores, but I still get them unordered, like the call for item 3003 completes before 3002, and so on.
The methods I'm using:
#Published var posts: [Post] = []
func getPosts(feedType: FeedType){
posts = []
self.getFeedIDs(feedType: feedType).subscribe{ ids in
let firstFifteen = ids[0...15]
let dGroup = DispatchGroup()
let dQueue = DispatchQueue(label: "network-queue")
let dSemaphore = DispatchSemaphore(value: 0)
dQueue.async {
for id in firstFifteen{
dGroup.enter()
self.getPost(id: id).subscribe{ post in
self.posts.append(post)
dSemaphore.signal()
dGroup.leave()
}
dSemaphore.wait()
}
}
}
}
func getFeedIDs(feedType: FeedType) -> Observable<[Int]> {
return self.execute(url: URL(string: "https://hacker-news.firebaseio.com/v0/\(feedType)stories.json")!)
}
func getPost(id: Int) -> Observable<Post>{
return self.execute(url: URL(string: "https://hacker-news.firebaseio.com/v0/item/\(id).json")!)
}
func execute <T: Decodable>(url: URL) -> Observable<T> {
return Observable.create { observer -> Disposable in
let task = URLSession.shared.dataTask(with: url) { res, _, _ in
guard let data = res, let decoded = try? JSONDecoder().decode(T.self, from: data) else {
return
}
observer.onNext(decoded)
observer.onCompleted()
}
task.resume()
return Disposables.create {
task.cancel()
}
}
}
Any help would be greatly appreciated.
The semaphore makes no sense and is inefficient anyway.
Use the same pattern which Apple suggests in conjunction with TaskGroups: Collect the data in a dictionary and after being notified sort the data by the dictionary keys
func getPosts(feedType: FeedType){
var postData = [Int:Post]()
posts = []
self.getFeedIDs(feedType: feedType).subscribe{ ids in
let firstFifteen = ids[0...15]
let dGroup = DispatchGroup()
for (index, element) in firstFifteen.enumerated() {
dGroup.enter()
self.getPost(id: element).subscribe{ post in
postData[index] = post
dGroup.leave()
}
}
dGroup.notify(queue: .main) {
for key in postData.keys.sorted() {
posts.append(postData[key]!)
}
}
}
}

Understanding actor and making it thread safe

I have an actor that is processing values and is then publishing the values with a Combine Publisher.
I have problems understanding actors, I thought when using actors in an async context, it would automatically be serialised. However, the numbers get processed in different orders and not in the expected order (see class tests for comparison).
I understand that if I would wrap Task around the for loop that then this would be returned serialised, but my understanding is, that I could call a function of an actor and this would then be automatically serialised.
How can I make my actor thread safe so it publishes the values in the expected order even if it is called from a different thread?
import XCTest
import Combine
import CryptoKit
actor AddNumbersActor {
private let _numberPublisher: PassthroughSubject<(Int,String), Never> = .init()
nonisolated lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
func process(_ number: Int) {
let string = SHA512.hash(data: Data(String(number).utf8))
.description
_numberPublisher.send((number, string))
}
}
class AddNumbersClass {
private let _numberPublisher: PassthroughSubject<(Int,String), Never> = .init()
lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
func process(_ number: Int) {
let string = SHA512.hash(data: Data(String(number).utf8))
.description
_numberPublisher.send((number, string))
}
}
final class TestActorWithPublisher: XCTestCase {
var subscription: AnyCancellable?
override func tearDownWithError() throws {
subscription = nil
}
func testActor() throws {
let addNumbers = AddNumbersActor()
var numbersResults = [(int: Int, string: String)]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
print(results)
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for number in 1...numberCount {
Task {
await addNumbers.process(number)
}
}
wait(for: [expectation], timeout: 5)
print(numbersResults.count)
XCTAssertEqual(numbersResults[10].0, 11)
XCTAssertEqual(numbersResults[100].0, 101)
XCTAssertEqual(numbersResults[500].0, 501)
}
func testClass() throws {
let addNumbers = AddNumbersClass()
var numbersResults = [(int: Int, string: String)]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
print(results)
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for number in 1...numberCount {
addNumbers.process(number)
}
wait(for: [expectation], timeout: 5)
print(numbersResults.count)
XCTAssertEqual(numbersResults[10].0, 11)
XCTAssertEqual(numbersResults[100].0, 101)
XCTAssertEqual(numbersResults[500].0, 501)
}
}
``
Using actor does indeed serialize access.
The issue you're running into is that the tests aren't testing whether calls to process() are serialized, they are testing the execution order of the calls. And the execution order of the Task calls is not guaranteed.
Try changing your AddNumbers objects so that instead of the output order reflecting the order in which the calls were made, they will succeed if calls are serialized but will fail if concurrent calls are made. You can do this by keeping a count variable, incrementing it, sleeping a bit, then publishing the count. Concurrent calls will fail, since count will be incremented multiple times before its returned.
If you make that change, the test using an Actor will pass. The test using a class will fail if it calls process() concurrently:
DispatchQueue.global(qos: .default).async {
addNumbers.process()
}
It will also help to understand that Task's scheduling depends on a bunch of stuff. GCD will spin up tons of threads, whereas Swift concurrency will only use 1 worker thread per available core (I think!). So in some execution environments, just wrapping your work in Task { } might be enough to serialize it for you. I've been finding that iOS simulators act as if they have a single core, so task execution ends up being serialized. Also, otherwise unsafe code will work if you ensure the task runs on the main actor, since it guarantees serial execution:
Task { #MainActor in
// ...
}
Here are modified tests showing all this:
class TestActorWithPublisher: XCTestCase {
actor AddNumbersActor {
private let _numberPublisher: PassthroughSubject<Int, Never> = .init()
nonisolated lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
var count = 0
func process() {
// Increment the count here
count += 1
// Wait a bit...
Thread.sleep(forTimeInterval: TimeInterval.random(in: 0...0.010))
// Send it back. If other calls to process() were made concurrently, count may have been incremented again before being sent:
_numberPublisher.send(count)
}
}
class AddNumbersClass {
private let _numberPublisher: PassthroughSubject<Int, Never> = .init()
lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
var count = 0
func process() {
count += 1
Thread.sleep(forTimeInterval: TimeInterval.random(in: 0...0.010))
_numberPublisher.send(count)
}
}
var subscription: AnyCancellable?
override func tearDownWithError() throws {
subscription = nil
}
func testActor() throws {
let addNumbers = AddNumbersActor()
var numbersResults = [Int]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for _ in 1...numberCount {
Task.detached(priority: .high) {
await addNumbers.process()
}
}
wait(for: [expectation], timeout: 10)
XCTAssertEqual(numbersResults, Array(1...numberCount))
}
func testClass() throws {
let addNumbers = AddNumbersClass()
var numbersResults = [Int]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for _ in 1...numberCount {
DispatchQueue.global(qos: .default).async {
addNumbers.process()
}
}
wait(for: [expectation], timeout: 5)
XCTAssertEqual(numbersResults, Array(1...numberCount))
}
}

How to limit flatMap concurrency in Combine still having all source events processed?

If I specify the maxPublishers parameter then source events after first maxPublishers events won't be flat mapped. While I want to limit only concurrency. That is to continue processing next events after some of the first maxPublishers flat map publishers have completed.
Publishers.Merge(
addImageRequestSubject
.flatMap(maxPublishers: .max(3)) { self.compressImage($0) }
.compactMap { $0 }
.flatMap(maxPublishers: .max(3)) { self.addImage($0) },
addVideoRequestSubject
.flatMap(maxPublishers: .max(3)) { self.addVideo(url: $0) }
).sink(receiveCompletion: { _ in }, receiveValue: {})
.store(in: &cancelBag)
I've also tried to limit concurrency with help of OperationQueue. But maxConcurrentOperationCount seems doesn't have an effect.
Publishers.Merge(
addImageRequestSubject
.receive(on: imageCompressionQueue)
.flatMap { self.compressImage($0) }
.compactMap { $0 }
.receive(on: mediaAddingQueue)
.flatMap { self.addImage($0) },
addVideoRequestSubject
.receive(on: mediaAddingQueue)
.flatMap { self.addVideo(url: $0) }
).sink(receiveCompletion: { _ in }, receiveValue: {})
.store(in: &cancelBag)
private lazy var imageCompressionQueue: OperationQueue = {
var queue = OperationQueue()
queue.maxConcurrentOperationCount = 3
return queue
}()
private lazy var mediaAddingQueue: OperationQueue = {
var queue = OperationQueue()
queue.maxConcurrentOperationCount = 3
return queue
}()
Flat map publishers look this way:
func compressImage(_ image: UIImage) -> Future<Data?, Never> {
Future { promise in
DispatchQueue.global().async {
let result = image.compressTo(15)?.jpegData(compressionQuality: 1)
promise(Result.success(result))
}
}
}
You have stumbled very beautifully right into the use case for the .buffer operator. Its purpose is to compensate for .flatMap backpressure by accumulating values that would otherwise be dropped.
I will illustrate by a completely artificial example:
class ViewController: UIViewController {
let sub = PassthroughSubject<Int,Never>()
var storage = Set<AnyCancellable>()
var timer : Timer!
override func viewDidLoad() {
super.viewDidLoad()
sub
.flatMap(maxPublishers:.max(3)) { i in
return Just(i)
.delay(for: 3, scheduler: DispatchQueue.main)
.eraseToAnyPublisher()
}
.sink { print($0) }
.store(in: &storage)
var count = 0
self.timer = Timer.scheduledTimer(withTimeInterval: 1, repeats: true) {
_ in
count += 1
self.sub.send(count)
}
}
}
So, our publisher is emitting an incremented integer every second, but our flatMap has .max(3) and takes 3 seconds to republish a value. The result is that we start to miss values:
1
2
3
5
6
7
9
10
11
...
The solution is to put a buffer in front of the flatMap. It needs to be large enough to hold any missed values long enough for them to be requested:
sub
.buffer(size: 20, prefetch: .keepFull, whenFull: .dropOldest)
.flatMap(maxPublishers:.max(3)) { i in
The result is that all the numeric values do in fact arrive at the sink. Of course in real life we could still lose values if the buffer is not large enough to compensate for disparity between the rate of value emission from the publisher and the rate of value emission from the backpressuring flatMap.

Why .collect() operator in swift Combine always sends .unlimited demand regardless of the demand of upstream publisher?

I have been playing with Combine to understand how it works in more details and I create a custom Publisher, Subscription and Subscriber.
Here's how it looks..
The emoji beamer publisher along with subscription:
struct EmojiBeamerPublisher: Publisher {
typealias Output = String
typealias Failure = Error
private let emojis: [String] = ["👍","❤️","✅","🥰","😍","🚀","😅","🍑","🍞","🎅","❄️","🐻","👀","👄","🦷","✍️","🙏","👨‍💻","🐝","🐛","🦉","🦀","🐍","🐞","🧸"]
func receive<S>(subscriber: S) where S : Subscriber, Self.Failure == S.Failure, Self.Output == S.Input {
let subscription = EmojiBeamerSubscription(output: emojis, subscriber: subscriber)
subscriber.receive(subscription: subscription)
}
}
extension EmojiBeamerPublisher {
private final class EmojiBeamerSubscription<S: Subscriber>: Subscription where S.Input == Output, S.Failure == Failure {
var subscriber: S?
let output: [String]
init(output: [String], subscriber: S) {
self.subscriber = subscriber
self.output = output
}
func request(_ demand: Subscribers.Demand) {
Swift.print("Demand: \(demand)") // Here I receive Unlimited demand
var demand = demand
Timer.scheduledTimer(withTimeInterval: 5, repeats: true) { [weak self] timer in
guard let self = self else { return }
guard demand > 0, let subscriber = self.subscriber else {
timer.invalidate()
self.subscriber?.receive(completion: .finished)
self.cancel()
return
}
demand -= 1
demand += subscriber.receive(self.output.randomElement()! + " \(Date())")
}
}
func cancel() {
subscriber = nil
}
}
}
Here is my Custom subscriber:
final class EmojiBeamerSubscriber<Input, Failure: Error>: Subscriber, Cancellable {
var subscription: Subscription?
let receiveValue: (Input) -> Void
init(receiveValue: #escaping (Input) -> Void) {
self.receiveValue = receiveValue
}
func receive(subscription: Subscription) {
self.subscription = subscription
subscription.request(.max(3)) // Here I send only 3 as max demand
}
func receive(_ input: Input) -> Subscribers.Demand {
receiveValue(input)
return .none
}
func receive(completion: Subscribers.Completion<Failure>) {
print("Will handle later:", completion)
}
func cancel() {
self.subscription?.cancel()
self.subscription = nil
}
}
extension Publisher {
func myCustomSink(receiveValueHandler: #escaping (Self.Output) -> Void) -> AnyCancellable {
let myCustomSubscriber = EmojiBeamerSubscriber<Self.Output, Self.Failure>(receiveValue: receiveValueHandler)
subscribe(myCustomSubscriber)
return AnyCancellable(myCustomSubscriber)
}
}
As you can see on my custom subscription I request with demand .max(3) if I don't use collect everything works fine, I get an emoji beamed every 5 second after 3 I got a .finish completion.
Works fine (and sends .max(3) demand):
let emojiBeamer = EmojiBeamerPublisher()
var cancellables = Set<AnyCancellable>()
emojiBeamer
.myCustomSink { value in Swift.print("Random Emoji:: \(value)") }
.store(in: &cancellables)
However if I simply add .collect() to catch all 3 results at once in an array it just requests with .unlimited demand on my subscription, resulting in a never ending subscription because my demand will never reach zero.
Never complete (and sends unlimited demand):
let emojiBeamer = EmojiBeamerPublisher()
var cancellables = Set<AnyCancellable>()
emojiBeamer
.collect()
.myCustomSink { value in Swift.print("Random Emoji:: \(value)") }
.store(in: &cancellables)
Is there something wrong with my implementation? Or Did I misunderstood the purpose of .collect() operator?
Thank you in advance :)
From the documentation:
This publisher requests an unlimited number of elements from the upstream publisher and uses an unbounded amount of memory to store the received values. The publisher may exert memory pressure on the system for very large sets of elements.
So the behaviour you noticed is the correct one, as collect() sends an unlimited demand upstream.
The unlimited demand causes the demand -= 1 instruction to do nothing, so the demand > 0 check will always pass, resulting into an infinite loop that never sends the completion. You will need an extra condition to make the "collected" stream a finite one.
For infinite streams, the collect(_:) overload (the one that allows to pass a number of items to collect) performs better in regards to the demand, but still requests from upstream more elements than one might expect:
When this publisher receives a request for .max(n) elements, it requests .max(count * n) from the upstream publisher.

Loop over Publisher Combine framework

I have the following function to perform an URL request:
final class ServiceManagerImpl: ServiceManager, ObservableObject {
private let session = URLSession.shared
func performRequest<T>(_ request: T) -> AnyPublisher<String?, APIError> where T : Request {
session.dataTaskPublisher(for: self.urlRequest(request))
.tryMap { data, response in
try self.validateResponse(response)
return String(data: data, encoding: .utf8)
}
.mapError { error in
return self.transformError(error)
}
.eraseToAnyPublisher()
}
}
Having these 2 following functions, I can now call the desired requests from corresponded ViewModel:
final class AuditServiceImpl: AuditService {
private let serviceManager: ServiceManager = ServiceManagerImpl()
func emptyAction() -> AnyPublisher<String?, APIError> {
let request = AuditRequest(act: "", nonce: String.randomNumberGenerator)
return serviceManager.performRequest(request)
}
func burbleAction(offset: Int) -> AnyPublisher<String?, APIError> {
let request = AuditRequest(act: "burble", nonce: String.randomNumberGenerator, offset: offset)
return serviceManager.performRequest(request)
}
}
final class AuditViewModel: ObservableObject {
#Published var auditLog: String = ""
private let auditService: AuditService = AuditServiceImpl()
init() {
let timer = Timer(timeInterval: 5, repeats: true) { _ in
self.getBurbles()
}
RunLoop.main.add(timer, forMode: .common)
}
func getBurbles() {
auditService.emptyAction()
.flatMap { [unowned self] offset -> AnyPublisher<String?, APIError> in
let currentOffset = Int(offset?.unwrapped ?? "") ?? 0
return self.auditService.burbleAction(offset: currentOffset)
}
.receive(on: RunLoop.main)
.sink(receiveCompletion: { [unowned self] completion in
print(completion)
}, receiveValue: { [weak self] burbles in
self?.auditLog = burbles!
})
.store(in: &cancellableSet)
}
}
Everything is fine when I use self.getBurbles() for the first time. However, for the next calls, print(completion) shows finished, and the code doesn't perform self?.auditLog = burbles!
I don't know how can I loop over the getBurbles() function and get the response at different intervals.
Edit
The whole process in a nutshell:
I call getBurbles() from class initializer
getBurbles() calls 2 nested functions: emptyAction() and burbleAction(offset: Int)
Those 2 functions generate different requests and call performRequest<T>(_ request: T)
Finally, I set the response into auditLog variable and show it on the SwiftUI layer
There are at least 2 issues here.
First when a Publisher errors it will never produce elements again. That's a problem here because you want to recycle the Publisher here and call it many times, even if the inner Publisher fails. You need to handle the error inside the flatMap and make sure it doesn't propagate to the enclosing Publisher. (ie you can return a Result or some other enum or tuple that indicates you should display an error state).
Second, flatMap is almost certainly not what you want here since it will merge all of the api calls and return them in arbitrary order. If you want to cancel any existing requests and only show the latest results then you should use .map followed by switchToLatest.