How to limit concurrent live URLSessions with Combine? - swift

I have a lot (~200) urls for images, and I need to download each one, then process (resize) it, then update the cache. The thing is - I only want to have at max 3 requests at once, and since the images are heavy, I also don't want a lot of responses "hanging" waiting to be processed (and taking memory...).
TLDR I want to call the next (4th) network request only after the receiveValue in the sink is called on one of the first 3 requests... (ie after the network response & processing are both done...).
Will this flow work, and will it hold on to the waiting urls and not drop them on the floor?
Also do I need that buffer() call? I use it after seeing this answer: https://stackoverflow.com/a/67011837/2242359
wayTooManyURLsToHandleAtOnce // this is a `[URL]`
.publisher
.buffer(size: .max, prefetch: .byRequest, whenFull: .dropNewest) // NEEDED?
.flatMap(maxPublishers: .max(3)) { url in
URLSession.shared
.dataTaskPublisher(for: url)
.map { (data: Data, _) -> Picture in
Picture(from: data)
}
}
.tryCompactMap {
resizeImage(picture: $0) // takes a while and might fail
}
.receive(on: DispatchQueue.main)
.sink { completion
// handling completion...
} receiveValue: { resizedImage
self.cache.append(resizedImage)
}
.store(...)

I would use a subject. This not an optimal solution but it looks working and maybe will trigger other ideas
var cancellable: AnyCancellable?
var urls: [String] = (0...6).map { _ in "http://httpbin.org/delay/" + String((0...2).randomElement()!) }
var subject: PassthroughSubject<[String], Never> = .init()
let maxConcurrentRequests = 3
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
print(urls)
cancellable = subject
.flatMap({ urls -> AnyPublisher<[URLSession.DataTaskPublisher.Output], URLError> in
let requests = urls.map { URLSession.shared.dataTaskPublisher(for: URL.init(string: $0)!) }
return Publishers.MergeMany(requests)
.collect().eraseToAnyPublisher()
})
.print()
.sink(receiveCompletion: { completion in
print(completion)
}, receiveValue: { value in
print(value)
if self.urls.count <= self.maxConcurrentRequests {
self.urls.removeAll()
self.subject.send(completion: .finished)
} else {
self.urls.removeLast(self.maxConcurrentRequests)
self.subject.send(self.urls.suffix(self.maxConcurrentRequests))
}
})
subject.send(urls.suffix(maxConcurrentRequests))
}

Related

Swift Combine - Accessing separate lists of publishers

I have two lists of URLs that return some links to images.
The lists are passed into a future like
static func loadRecentEpisodeImagesFuture(request: [URL]) -> AnyPublisher<[RecentEpisodeImages], Never> {
return Future { promise in
print(request)
networkAPI.recentEpisodeImages(url: request)
.sink(receiveCompletion: { _ in },
receiveValue: { recentEpisodeImages in
promise(.success(recentEpisodeImages))
})
.store(in: &recentImagesSubscription)
}
.eraseToAnyPublisher()
}
Which calls:
/// Get a list of image sizes associated with a featured episode .
func featuredEpisodeImages(featuredUrl: [URL]) -> AnyPublisher<[FeaturedEpisodeImages], Error> {
let featuredEpisodesImages = featuredUrl.map { (featuredUrl) -> AnyPublisher<FeaturedEpisodeImages, Error> in
return URLSession.shared
.dataTaskPublisher(for: featuredUrl)
.map(\.data)
.decode(type: FeaturedEpisodeImages.self, decoder: decoder)
.receive(on: networkApiQueue)
.catch { _ in Empty<FeaturedEpisodeImages, Error>() }
.print("###Featured###")
.eraseToAnyPublisher()
}
return Publishers.MergeMany(featuredEpisodesImages).collect().eraseToAnyPublisher()
}
/// Get a list of image sizes associated with a recent episode .
func recentEpisodeImages(recentUrl: [URL]) -> AnyPublisher<[RecentEpisodeImages], Error> {
let recentEpisodesImages = recentUrl.map { (recentUrl) -> AnyPublisher<RecentEpisodeImages, Error> in
return URLSession.shared
.dataTaskPublisher(for: recentUrl)
.map(\.data)
.decode(type: RecentEpisodeImages.self, decoder: decoder)
.receive(on: networkApiQueue)
.catch { _ in Empty<RecentEpisodeImages, Error>() }
.print("###Recent###")
.eraseToAnyPublisher()
}
return Publishers.MergeMany(recentEpisodesImages).collect().eraseToAnyPublisher()
}
and is attached to the app state:
/// Takes an action and returns a future mapped to another action.
static func recentEpisodeImages(action: RequestRecentEpisodeImages) -> AnyPublisher<Action, Never> {
return loadRecentEpisodeImagesFuture(request: action.request)
.receive(on: networkApiQueue)
.map({ images in ResponseRecentEpisodeImages(response: images) })
.replaceError(with: RequestFailed())
.eraseToAnyPublisher()
}
It seems that:
return Publishers.MergeMany(recentEpisodes).collect().eraseToAnyPublisher()
doesn't give me a reliable downstream value as whichever response finishes last overwrites the earlier response.
I am able to log the responses of both series of requests. Both are processing the correct arrays and returning the proper json.
I would like something like:
return recentEpisodeImages
but currently this gives me the error
Cannot convert return expression of type '[AnyPublisher<RecentEpisodeImages, Error>]' to return type 'AnyPublisher<[RecentEpisodeImages], Error>'
How can I collect the values of the inner publisher and return them as
AnyPublisher<[RecentEpisodeImages], Error>
Presuming that the question is how to turn an array of URLs into an array of what you get when you download and process the data from those URLs, the answer is: turn the array into a sequence publisher, process each URL by way of flatMap, and collect the result.
Here, for instance, is how to turn an array of URLs representing images into an array of the actual images (not identically what you're trying to do, but probably pretty close):
func publisherOfArrayOfImages(urls:[URL]) -> AnyPublisher<[UIImage],Error> {
urls.publisher
.flatMap { (url:URL) -> AnyPublisher<UIImage,Error> in
return URLSession.shared.dataTaskPublisher(for: url)
.compactMap { UIImage(data: $0.0) }
.mapError { $0 as Error }
.eraseToAnyPublisher()
}.collect().eraseToAnyPublisher()
}
And here's how to test it:
let urls = [
URL(string:"http://www.apeth.com/pep/moe.jpg")!,
URL(string:"http://www.apeth.com/pep/manny.jpg")!,
URL(string:"http://www.apeth.com/pep/jack.jpg")!,
]
let pub = publisherOfArrayOfImages(urls:urls)
pub.sink { print($0) }
receiveValue: { print($0) }
.store(in: &storage)
You'll see that what pops out the bottom of the pipeline is an array of three images, corresponding to the array of three URLs we started with.
(Note, please, that the order of the resulting array is random. We fetched the images asynchronously, so the results arrive back at our machine in whatever order they please. There are ways around that problem, but it is not what you asked about.)

How can I queue up URLSession.DataTaskPublisher requests so that only one is made at a time?

In the code below, an array of app objects is used to create an array of publishers which are merged into an array release objects.
apps.map { latestRelease(app: $0) }.merge()
Here is how latest release is done.
func latestRelease(app: App) -> AnyPublisher<Release, Error> {
do {
let request = try requestFactory.make(.get, "apps/\(app.owner.name)/\(app.name)/releases/latest")
return publisherFactory.make(for: request)
.mapError{ $0 as Error }
.map { data, _ in data }
.decode(type: Release.self, decoder: decoder)
.eraseToAnyPublisher()
} catch {
return Fail(error: error)
.eraseToAnyPublisher()
}
}
The network requests are done with a factory.
struct AppCenterPublisherFactory: DataTaskPublisherFactory {
let session: URLSession
init(session: URLSession = .shared) {
session.configuration.httpMaximumConnectionsPerHost = 1
self.session = session
}
func make(for request: URLRequest) -> URLSession.DataTaskPublisher {
return session.dataTaskPublisher(for: request)
}
}
The problem is the release publishers make network requests immediately. This causes the server to return 429 Too Many Requests. How can I queue up URLSession.DataTaskPublisher requests so that only one is made at a time with a delay between each request?
If you use append instead of merge to combine the publishers, they will run serially instead of concurrently.
For a delay, you can prepend Empty().delay(for: cooldown, scheduler: whatever) to each release publisher except the first.
func releases<S: Scheduler>(of apps: [App], scheduler: S, cooldown: S.SchedulerTimeType.Stride)
-> AnyPublisher<Release, Error>
{
let singles = apps.map { latestRelease(app: $0) }
guard let first = singles.first else { return Empty().eraseToAnyPublisher() }
let combo: AnyPublisher<Release, Error> = singles.dropFirst()
.reduce(first.eraseToAnyPublisher(), { combo, single in
combo
.append(Empty().delay(for: cooldown, scheduler: scheduler))
.append(single)
.eraseToAnyPublisher()
})
return combo
}
Test:
let testApps: [App] = [
.init(name: "Facebook", owner: .init(name: "zuck")),
.init(name: "Kindle", owner: .init(name: "bezos")),
.init(name: "Crossword", owner: .init(name: "shortz")),
]
print("starting at \(Date())")
let ticket = releases(of: testApps, scheduler: DispatchQueue.global(qos: .utility), cooldown: .seconds(2))
.sink(
receiveCompletion: { print("got \($0) at \(Date())") },
receiveValue: { print("got \($0) at \(Date())") })
Output:
starting at 2020-05-12 17:45:17 +0000
got Release(name: "Facebook") at 2020-05-12 17:45:17 +0000
got Release(name: "Kindle") at 2020-05-12 17:45:19 +0000
got Release(name: "Crossword") at 2020-05-12 17:45:21 +0000
got finished at 2020-05-12 17:45:21 +0000
You can deliver tasks on specified scheduler using receive(on:options:). As a sample
let queue = DispatchQueue(label: "App_Queue", qos: .default)
Then change like this
publisherFactory.make(for: request).receive(on: queue) // rest of code...
Hope you get around it.
Thanks, X_X

How to process an array of task asynchronously with swift combine

I have a publisher which takes a network call and returns an array of IDs. I now need to call another network call for each ID to get all my data. And I want the final publisher to have the resulting object.
First network result:
"user": {
"id": 0,
"items": [1, 2, 3, 4, 5]
}
Final object:
struct User {
let id: Int
let items: [Item]
... other fields ...
}
struct Item {
let id: Int
... other fields ...
}
Handling multiple network calls:
userPublisher.flatMap { user in
let itemIDs = user.items
return Future<[Item], Never>() { fulfill in
... OperationQueue of network requests ...
}
}
I would like to perform the network requests in parallel, since they are not dependent on each other. I'm not sure if Future is right here, but I'd imagine I would then have code to do a
DispatchGroup or OperationQueue and fulfill when they're all done. Is there more of a Combine way of doing this?
Doe Combine have a concept of splitting one stream into many parallel streams and joining the streams together?
Combine offers extensions around URLSession to handle network requests unless you really need to integrate with OperationQueue based networking, then Future is a fine candidate. You can run multiple Futures and collect them at some point, but I'd really suggest looking at URLSession extensions for Combine.
struct User: Codable {
var username: String
}
let requestURL = URL(string: "https://example.com/")!
let publisher = URLSession.shared.dataTaskPublisher(for: requestURL)
.map { $0.data }
.decode(type: User.self, decoder: JSONDecoder())
Regarding running a batch of requests, it's possible to use Publishers.MergeMany, i.e:
struct User: Codable {
var username: String
}
let userIds = [1, 2, 3]
let subscriber = Just(userIds)
.setFailureType(to: Error.self)
.flatMap { (values) -> Publishers.MergeMany<AnyPublisher<User, Error>> in
let tasks = values.map { (userId) -> AnyPublisher<User, Error> in
let requestURL = URL(string: "https://jsonplaceholder.typicode.com/users/\(userId)")!
return URLSession.shared.dataTaskPublisher(for: requestURL)
.map { $0.data }
.decode(type: User.self, decoder: JSONDecoder())
.eraseToAnyPublisher()
}
return Publishers.MergeMany(tasks)
}.collect().sink(receiveCompletion: { (completion) in
if case .failure(let error) = completion {
print("Got error: \(error.localizedDescription)")
}
}) { (allUsers) in
print("Got users:")
allUsers.map { print("\($0)") }
}
In the example above I use collect to collect all results, which postpones emitting the value to the Sink until all of the network requests successfully finished, however you can get rid of the collect and receive each User in the example above one by one as network requests complete.

Why doesn't URLSession.DataTaskPublisher ever publish values?

In Xcode 11 beta 5 or 6 my existing code that relied on URLSession.DataTaskPublisher stopped working. It seems like DataTaskPublisher is never publishing any values but I can't work out why.
I've tried with .sink and .handleEvents as subscribers. I've tested .sink with a Just publisher and confirmed it receives a value there.
I've also tried both giving the DataTaskPublisher a URL and giving it a URLRequest. I've tried a request to an API including an authorization header, as well as basic requests to google.com and apple.com. I've tried using URLSession.shared and creating a new instance of URLSession. I've also tried with and without map and decode operators.
I've used XCTest expectations to confirm that the test times out every single time, even if I give it a 4-minute timeout.
I just made a new example project and replicated the problem with the following code in the root view controller:
override func viewDidLoad() {
super.viewDidLoad()
print("view did load")
URLSession.shared.dataTaskPublisher(for: URL(string: "http://apple.com")!)
.handleEvents(receiveSubscription: { (sub) in
print(sub)
}, receiveOutput: { (response) in
print(response)
}, receiveCompletion: { (completion) in
print(completion)
}, receiveCancel: {
print("cancel")
}, receiveRequest: { (demand) in
print(demand)
})
}
The project prints "view did load" but nothing else ever prints. Any ideas about where I'm going wrong here? Thanks!
I think that there are two problems with your code, firstly you only have a publisher (handleEvent returns a publisher) and secondly that publisher goes out of scope and disappears. This works although it isn't exactly elegant.
import Combine
import SwiftUI
var pub: AnyPublisher<(data: Data, response: URLResponse), URLError>? = nil
var sub: Cancellable? = nil
var data: Data? = nil
var response: URLResponse? = nil
func combineTest() {
guard let url = URL(string: "https://apple.com") else {
return
}
pub = URLSession.shared.dataTaskPublisher(for: url)
.print("Test")
.eraseToAnyPublisher()
sub = pub?.sink(
receiveCompletion: { completion in
switch completion {
case .finished:
break
case .failure(let error):
fatalError(error.localizedDescription)
}
},
receiveValue: { data = $0.data; response = $0.response }
)
}
struct ContentView: View {
var body: some View {
Button(
action: { combineTest() },
label: { Text("Do It").font(.largeTitle) }
)
}
}
I did it in SwiftUI so that I would have less to worry about and I used 3 variables so that I could follow better. You need to use the 2 parameter sink as the publisher's error isn't Never. Finally the print() is just for test and works really well.

Nested async calls in Swift

I'm kind of new to programming in general, so I have this maybe simple question. Actually, writing helps me to identify the problem faster.
Anyway, I have an app with multiple asynchronous calls, they are nested like this:
InstagramUnoficialAPI.shared.getUserId(from: username, success: { (userId) in
InstagramUnoficialAPI.shared.fetchRecentMedia(from: userId, success: { (data) in
InstagramUnoficialAPI.shared.parseMediaJSON(from: data, success: { (media) in
guard let items = media.items else { return }
self.sortMediaToCategories(media: items, success: {
print("success")
// Error Handlers
Looks horrible, but that's not the point. I will investigate the Promise Kit once I get this working.
I need the sortMediaToCategories to wait for completion and then reload my collection view. However, in the sortMediaToCategories I have another nested function, which is async too and has a for in loop.
func sortMediaToCategories(media items: [StoryData.Items],
success: #escaping (() -> Swift.Void),
failure: #escaping (() -> Swift.Void)) {
let group = DispatchGroup()
group.enter()
for item in items {
if item.media_type == 1 {
guard let url = URL(string: (item.image_versions2?.candidates?.first!.url)!) else {return}
mediaToStorageDistribution(withImageUrl: url,
videoUrl: nil,
mediaType: .jpg,
takenAt: item.taken_at,
success: { group.notify(queue: .global(), execute: {
self.collectionView.reloadData()
group.leave()
}) },
failure: { print("error") })
//....
I can't afford the collection view to reload every time obviously, so I need to wait for loop to finish and then reload.
I'm trying to use Dispatch Groups, but struggling with it. Could you please help me with this? Any simple examples and any advice will be very appreciated.
The problem you face is a common one: having multiple asynchronous tasks and wait until all are completed.
There are a few solutions. The most simple one is utilising DispatchGroup:
func loadUrls(urls: [URL], completion: #escaping ()->()) {
let grp = DispatchGroup()
urls.forEach { (url) in
grp.enter()
URLSession.shared.dataTask(with: url) { data, response, error in
// handle error
// handle response
grp.leave()
}.resume()
}
grp.notify(queue: DispatchQueue.main) {
completion()
}
}
The function loadUrls is asynchronous and expects an array of URLs as input and a completion handler that will be called when all tasks have been completed. This will be accomplished with the DispatchGroup as demonstrated.
The key is, to ensure that grp.enter() will be called before invoking a task and grp.leave is called when the task has been completed. enter and leave shall be balanced.
grp.notify finally registers a closure which will be called on the specified dispatch queue (here: main) when the DispatchGroup grp balances out (that is, its internal counter reaches zero).
There are a few caveats with this solution, though:
All tasks will be started nearly at the same time and run concurrently
Reporting the final result of all tasks via the completion handler is not shown here. Its implementation will require proper synchronisation.
For all of these caveats there are nice solutions which should be implemented utilising suitable third party libraries. For example, you can submit the tasks to some sort of "executer" which controls how many tasks run concurrently (match like OperationQueue and async Operations).
Many of the "Promise" or "Future" libraries simplify error handling and also help you to solve such problems with just one function call.
You can reloadData when the last item calls the success block in this way.
let lastItemIndex = items.count - 1
for(index, item) in items.enumerated() {
if item.media_type == 1 {
guard let url = URL(string: (item.image_versions2?.candidates?.first!.url)!) else {return}
mediaToStorageDistribution(withImageUrl: url,
videoUrl: nil,
mediaType: .jpg,
takenAt: item.taken_at,
success: {
if index == lastItemIndex {
DispatchQueue.global().async {
self.collectionView.reloadData()
}
}
},
failure: { print("error") })
}
You have to move the group.enter() call inside your loop. Calls to enter and leave have to be balanced. If your callbacks of the mediaToStorageDistribution function for success and failure are exclusive you also need to leave the group on failure. When all blocks that called enter leave the group notify will be called. And you probably want to replace the return in your guard statement with a break, to just skip items with missing URLs. Right now you are returning from the whole sortMediaToCatgories function.
func sortMediaToCategories(media items: [StoryData.Items], success: #escaping (() -> Void), failure: #escaping (() -> Void)) {
let group = DispatchGroup()
for item in items {
if item.media_type == 1 {
guard let url = URL(string: (item.image_versions2?.candidates?.first!.url)!) else { break }
group.enter()
mediaToStorageDistribution(withImageUrl: url,
videoUrl: nil,
mediaType: .jpg,
takenAt: item.taken_at,
success: { group.leave() },
failure: {
print("error")
group.leave()
})
}
}
group.notify(queue: .main) {
self.collectionView.reloadData()
}
}