Lets imagine we have an array of AnObject instances and need to have following sequence of actions to execute:
send objects to backend via separate calls
after step 1 finishes store this array to DB in batch
after step 2 finishes do additional processing for each item
and we'd want to receive the signal only after all those steps were executed (or there was an error). What is the correct way to achieve this via RxSwift and is it actually possible?
Please find my prototype functions below. Unfortunately I didn't come up with a valid code sample for chaining, so there's nothing to demo.
func makeAPIRequest(object: AnObject) -> Observable<Void> {
...
}
func storeData(data: [AnObject]) -> Observable<Void> {
...
}
func additionalProcessing(object: AnObject) -> Observable<Void> {
...
}
func submitData()
{
let data: [AnObject] = ...;
let apiOperations = data.map{ makeAPIRequest($0) };
let storageOperation = storeData(data);
let processingOperations = data.map{ additionalProcessing($0) };
... // some code to chain steps 1-3
.subscribe { (event) -> Void in
// should be called when operations from step 3 finished
}.addDisposableTo(disposeBag);
}
Let's assume that both makeAPIRequest and additionalProcessing return an Observable<SomeNotVoidType>, and storeData takes an array as its argument and returns an Observable<Array>.
This way, you can do the following:
First, create an array of Observables representing sending individual objects to backend. Then use toObservable method, so the resulting signals can be transformed later on:
let apiOperations = data.map{ makeAPIRequest($0) }.toObservable()
then use merge operator which will make an Observable, that completes only when all of the API calls complete. You can also use toArray operator, which will put the API call results into one array:
let resultsArray = apiOperations.merge().toArray()
This will get you an Observable<Array<ApiResult>>, which will send one Next event when all API operations complete successfully. Now you can store the results in the database:
let storedResults = resultsArray.flatMap { storeDatabase($0) }
Then again you want to make Observables for each array element, so they'll represent additional processing. Note that you need to use flatMap and flatMapLates, otherwise you'll end up with nested observables like Observable<Observable<SomeType>>.
let additionalProcessingResults = storedResults.flatMap {
return $0.map(additionalProcessing).toObservable()
}.flatMapLatest { return $0 }
Then, you can subscribe for successful completion of the additional processing (or you can do something with its individual results):
additionalProcessingResults.subscribe { (event) -> Void in
// should be called when operations from step 3 finished
}.addDisposableTo(disposeBag);
Note that you don't need all the intermediate variables, I just left them to describe all the steps.
Related
I have a class Complicated, where (it's not a real code):
class BeReadyInSomeTime {
var someData: SomeData
var whenDone: () -> Void
var isDone: Bool = false
var highRes: [LongCountedStuff] = []
init(data:SomeData, whenDone: #escaping () - >Void) {
self.someData = someData
self.whenDone = whenDone
... prepare `highRes` in background...
{ makeHighRes() }
... and when done set `isDone` to `true`, fire `whenDone()`
}
func reset(data:SomeData) {
self.someData = someData
self.isDone = false
self.highRes = []
... forget **immediately** about job from init or reset, start again
{ makeHighRes() }
... and when done set `isDone` to `true`, fire `whenDone()`
}
var highResolution:AnotherType {
if isDone {
return AnotherType(from: highRes)
} else {
return AnotherType(from: someData)
}
}
func makeHighRes() {
var result = [LongCountedStuff]
// prepare data, fast
let some intermediateResult = almost ()
self.highRes = result
}
func almost() -> [LongCountedStuff] {
if isNice {
return countStuff(self.someData)
} else {
return []
}
func countStuff(stuff:[LongCountedStuff], deep:Int = 0) -> [LongCountedSuff] {
if deep == deep enough {
return stuff
} else {
let newStuff = stuff.work
count(newStuff, deep: deep+1)
}
}
Making highRes array is a recurrent function which calls itself many times and sometimes it takes seconds, but I need feedback as fast as possible (and it will be one of someData elements, so I'm safe). As far I know, I can only 'flag' DispatchWorkItem that's cancelled. If I deliver new data by reset few times per second (form mouse drag) whole block is counted in background as many times as data was delivered. How to deal with this kind of problem? To really break counting highRes?
If you have a routine that is constantly calling another framework and you want to stop it at the end of one iteration and before it starts the next iteration, then wrapping this in an Operation and checking isCancelled is a good pattern. (You can also use GCD and DispatchWorkItem and use its isCancelled, too, but I find operations do this more elegantly.)
But if you’re saying you not only want to cancel your loop, but also hope to stop the consuming call within that framework, then, no, you can’t do that (unless the framework provides some cancelation mechanism of its own). But there is no preemptive cancellation. You can’t just stop a time consuming calculation unless you add checks inside that calculation to check to see if it has been canceled.
I’d also ask whether the recursive pattern is right here. Do you really need the results of one calculation in order to start the next? If so, then a recursive (or iterative) pattern is fine. But if the recursive operation is just to pass the next unit of work, then a non-recursive pattern might be better, because it opens up the possibility of doing calculations in parallel.
For example, you might create a concurrent queue with a maxConcurrencyCount of some reasonable value (e.g. 4 or 6). Then wrap each individual processing task in its own Operation subclass and have each check its respective isCancelled. Then you can just add all the operations up front, and let the queue handle it from there. And when you want to stop them, you can tell the queue to cancelAllOperations. It’s a relative simple pattern, allows you to do calculations in parallel, and is cancelable. But this obviously only works if a given operations is not strictly dependent upon the results of the prior operation(s).
I've created a Combine publisher chain that looks something like this:
let pub = getSomeAsyncData()
.mapError { ... }
.map { ... }
...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
return wsi.subject
}
.share().eraseToAnyPublisher()
It's a flow of different possible network requests and data transformations. The calling code wants to subscribe to pub to find out when the whole asynchronous process has succeeded or failed.
I'm confused about the design of the flatMap step with the WebSocketInteraction. That's a helper class that I wrote. I don't think its internal details are important, but its purpose is to provide its subject property (a PassthroughSubject) as the next Publisher in the chain. Internally the WebSocketInteraction uses URLSessionWebSocketTask, talks to a server, and publishes to the subject. I like flatMap, but how do you keep this piece alive for the lifetime of the Publisher chain?
If I store it in the outer object (no problem), then I need to clean it up. I could do that when the subject completes, but if the caller cancels the entire publisher chain then I won't receive a completion event. Do I need to use Publisher.handleEvents and listen for cancellation as well? This seems a bit ugly. But maybe there is no other way...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
self.currentWsi = wsi // store in containing object to keep it alive.
wsi.subject.sink(receiveCompletion: { self.currentWsi = nil })
wsi.subject.handleEvents(receiveCancel: {
wsi.closeWebSocket()
self.currentWsi = nil
})
Anyone have any good "design patterns" here?
One design I've considered is making my own Publisher. For example, instead of having WebSocketInteraction vend a PassthroughSubject, it could conform to Publisher. I may end up going this way, but making a custom Combine Publisher is more work, and the documentation steers people toward using a subject instead. To make a custom Publisher you have to implement some of things that the PassthroughSubject does for you, like respond to demand and cancellation, and keep state to ensure you complete at most once and don't send events after that.
[Edit: to clarify that WebSocketInteraction is my own class.]
It's not exactly clear what problems you are facing with keeping an inner object alive. The object should be alive so long as something has a strong reference to it.
It's either an external object that will start some async process, or an internal closure that keeps a strong reference to self via self.subject.send(...).
class WebSocketInteraction {
private let subject = PassthroughSubject<String, Error>()
private var isCancelled: Bool = false
init() {
// start some async work
DispatchQueue.main.asyncAfter(deadline: .now() + 1) {
if !isCancelled { self.subject.send("Done") } // <-- ref
}
}
// return a publisher that can cancel the operation when
var pub: AnyPublisher<String, Error> {
subject
.handleEvents(receiveCancel: {
print("cancel handler")
self.isCancelled = true // <-- ref
})
.eraseToAnyPublisher()
}
}
You should be able to use it as you wanted with flatMap, since the pub property returned publisher, and the inner closure hold a reference to self
let pub = getSomeAsyncData()
...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
return wsi.pub
}
This might be a can of worms, I'll do my best to describe the issue. We have a long running data processing job. Our database of actions is added to nightly and the outstanding actions are processed. It takes about 15 minutes to process nightly actions. In Vapor 2 we utilised a lot of raw queries to create a PostgreSQL cursor and loop through it until it was empty.
For the time being, we run the processing via a command line parameter. In future we wish to have it run as part of the main server so that progress can be checked while processing is being performed.
func run(using context: CommandContext) throws -> Future<Void> {
let table = "\"RecRegAction\""
let cursorName = "\"action_cursor\""
let chunkSize = 10_000
return context.container.withNewConnection(to: .psql) { connection in
return PostgreSQLDatabase.transactionExecute({ connection -> Future<Int> in
return connection.simpleQuery("DECLARE \(cursorName) CURSOR FOR SELECT * FROM \(table)").map { result in
var totalResults = 0
var finished : Bool = false
while !finished {
let results = try connection.raw("FETCH \(chunkSize) FROM \(cursorName)").all(decoding: RecRegAction.self).wait()
if results.count > 0 {
totalResults += results.count
print(totalResults)
// Obviously we do our processing here
}
else {
finished = true
}
}
return totalResults
}
}, on: connection)
}.transform(to: ())
}
Now this doesn't work because I'm calling wait() and I get the error "Precondition failed: wait() must not be called when on the EventLoop" which is fair enough. One of the issues I face is that I have no idea how you even get off the main event loop to run things like this on a background thread. I am aware of BlockingIOThreadPool, but that still seems to operate on the same EventLoop and still causes the error. While I'm able to theorise more and more complicated ways to achieve this, I'm hoping I'm missing an elegant solution which perhaps somebody with better knowledge of SwiftNIO and Fluent could help out with.
Edit: To be clear, the goal of this is obviously not to total up the number of actions in the database. The goal is to use the cursor to process every action synchronously. As I read the results in, I detect changes in the actions and then throw batches of them out to processing threads. When all the threads are busy, I don't start reading from the cursor again until they complete.
There are a LOT of these actions, up to 45 million in a single run. Aggregating promises and recursion didn't seem to be a great idea and when I tried it, just for the sake of it, the server hung.
This is a processing intensive task that can run for days on a single thread, so I'm not concerned about creating new threads. The issue is that I cannot work out how I can use the wait() function inside a Command as I need a container to create the database connection and the only one I have access to is context.container Calling wait() on this leads to the above error.
TIA
Ok, so as you know, the problem lies in these lines:
while ... {
...
try connection.raw("...").all(decoding: RecRegAction.self).wait()
...
}
you want to wait for a number of results and therefore you use a while loop and .wait() for all the intermediate results. Essentially, this is turning asynchronous code into synchronous code on the event loop. That is likely leading to deadlocks and will for sure stall other connections which is why SwiftNIO tries to detect that and give you that error. I won't go into the details why it's stalling other connections or why this is likely to lead to deadlocks in this answer.
Let's see what options we have to fix this issue:
as you say, we could just have this .wait() on another thread that isn't one of the event loop threads. For this any non-EventLoop thread would do: Either a DispatchQueue or you could use the BlockingIOThreadPool (which does not run on an EventLoop)
we could rewrite your code to be asynchronous
Both solutions will work but (1) is really not advisable as you would burn a whole (kernel) thread just to wait for the results. And both Dispatch and BlockingIOThreadPool have a finite number of threads they're willing to spawn so if you do that often enough you might run out of threads so it'll take even longer.
So let's look into how we can call an asynchronous function multiple times whilst accumulating the intermediate results. And then if we have accumulated all the intermediate results continue with all the results.
To make things easier let's look at a function that is very similar to yours. We assume this function to be provided just like in your code
/// delivers partial results (integers) and `nil` if no further elements are available
func deliverPartialResult() -> EventLoopFuture<Int?> {
...
}
what we would like now is a new function
func deliverFullResult() -> EventLoopFuture<[Int]>
please note how the deliverPartialResult returns one integer each time and deliverFullResult delivers an array of integers (ie. all the integers). Ok, so how do we write deliverFullResult without calling deliverPartialResult().wait()?
What about this:
func accumulateResults(eventLoop: EventLoop,
partialResultsSoFar: [Int],
getPartial: #escaping () -> EventLoopFuture<Int?>) -> EventLoopFuture<[Int]> {
// let's run getPartial once
return getPartial().then { partialResult in
// we got a partial result, let's check what it is
if let partialResult = partialResult {
// another intermediate results, let's accumulate and call getPartial again
return accumulateResults(eventLoop: eventLoop,
partialResultsSoFar: partialResultsSoFar + [partialResult],
getPartial: getPartial)
} else {
// we've got all the partial results, yay, let's fulfill the overall future
return eventLoop.newSucceededFuture(result: partialResultsSoFar)
}
}
}
Given accumulateResults, implementing deliverFullResult is not too hard anymore:
func deliverFullResult() -> EventLoopFuture<[Int]> {
return accumulateResults(eventLoop: myCurrentEventLoop,
partialResultsSoFar: [],
getPartial: deliverPartialResult)
}
But let's look more into what accumulateResults does:
it invokes getPartial once, then when it calls back it
checks if we have
a partial result in which case we remember it alongside the other partialResultsSoFar and go back to (1)
nil which means partialResultsSoFar is all we get and we return a new succeeded future with everything we have collected so far
that's already it really. What we did here is to turn the synchronous loop into asynchronous recursion.
Ok, we looked at a lot of code but how does this relate to your function now?
Believe it or not but this should actually work (untested):
accumulateResults(eventLoop: el, partialResultsSoFar: []) {
connection.raw("FETCH \(chunkSize) FROM \(cursorName)")
.all(decoding: RecRegAction.self)
.map { results -> Int? in
if results.count > 0 {
return results.count
} else {
return nil
}
}
}.map { allResults in
return allResults.reduce(0, +)
}
The result of all this will be an EventLoopFuture<Int> which carries the sum of all the intermediate result.count.
Sure, we first collect all your counts into an array to then sum it up (allResults.reduce(0, +)) at the end which is a bit wasteful but also not the end of the world. I left it this way because that makes accumulateResults be usable in other cases where you want to accumulate partial results in an array.
Now one last thing, a real accumulateResults function would probably be generic over the element type and also we can eliminate the partialResultsSoFar parameter for the outer function. What about this?
func accumulateResults<T>(eventLoop: EventLoop,
getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<[T]> {
// this is an inner function just to hide it from the outside which carries the accumulator
func accumulateResults<T>(eventLoop: EventLoop,
partialResultsSoFar: [T] /* our accumulator */,
getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<[T]> {
// let's run getPartial once
return getPartial().then { partialResult in
// we got a partial result, let's check what it is
if let partialResult = partialResult {
// another intermediate results, let's accumulate and call getPartial again
return accumulateResults(eventLoop: eventLoop,
partialResultsSoFar: partialResultsSoFar + [partialResult],
getPartial: getPartial)
} else {
// we've got all the partial results, yay, let's fulfill the overall future
return eventLoop.newSucceededFuture(result: partialResultsSoFar)
}
}
}
return accumulateResults(eventLoop: eventLoop, partialResultsSoFar: [], getPartial: getPartial)
}
EDIT: After your edit your question suggests that you do not actually want to accumulate the intermediate results. So my guess is that instead, you want to do some processing after every intermediate result has been received. If that's what you want to do, maybe try this:
func processPartialResults<T, V>(eventLoop: EventLoop,
process: #escaping (T) -> EventLoopFuture<V>,
getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<V?> {
func processPartialResults<T, V>(eventLoop: EventLoop,
soFar: V?,
process: #escaping (T) -> EventLoopFuture<V>,
getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<V?> {
// let's run getPartial once
return getPartial().then { partialResult in
// we got a partial result, let's check what it is
if let partialResult = partialResult {
// another intermediate results, let's call the process function and move on
return process(partialResult).then { v in
return processPartialResults(eventLoop: eventLoop, soFar: v, process: process, getPartial: getPartial)
}
} else {
// we've got all the partial results, yay, let's fulfill the overall future
return eventLoop.newSucceededFuture(result: soFar)
}
}
}
return processPartialResults(eventLoop: eventLoop, soFar: nil, process: process, getPartial: getPartial)
}
This will (as before) run getPartial until it returns nil but instead of accumulating all of getPartial's results, it calls process which gets the partial result and can do some further processing. The next getPartial call will happen when the EventLoopFuture process returns is fulfilled.
Is that closer to what you would like?
Notes: I used SwiftNIO's EventLoopFuture type here, in Vapor you would just use Future instead but the remainder of the code should be the same.
Here's the generic solution, rewritten for NIO 2.16/Vapor 4, and as an extension to EventLoop
extension EventLoop {
func accumulateResults<T>(getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<[T]> {
// this is an inner function just to hide it from the outside which carries the accumulator
func accumulateResults<T>(partialResultsSoFar: [T] /* our accumulator */,
getPartial: #escaping () -> EventLoopFuture<T?>) -> EventLoopFuture<[T]> {
// let's run getPartial once
return getPartial().flatMap { partialResult in
// we got a partial result, let's check what it is
if let partialResult = partialResult {
// another intermediate results, let's accumulate and call getPartial again
return accumulateResults(partialResultsSoFar: partialResultsSoFar + [partialResult],
getPartial: getPartial)
} else {
// we've got all the partial results, yay, let's fulfill the overall future
return self.makeSucceededFuture(partialResultsSoFar)
}
}
}
return accumulateResults(partialResultsSoFar: [], getPartial: getPartial)
}
}
How do I properly (from multithreading point of view) pass a closure to another thread?
Consider a situation:
class NetManager {
...
var processingClosure : (Data, DispatchQueue, #escaping (Data?) -> ()) -> () = {
respData, complQueue, complClosure in
let resultData = // process respData according to some logic and get resultData
complQueue.async {
complClosure(resultData)
}
// PLEASE NOTE that there is no captured variables in this closure
}
...
func requestData1(..., complClosure) {
// this is main thread context
// make request to endpoint 1 somehow and process result in separate processing queue
...
let procClosure = self.processingClosure
// processingQueue is NOT main queue and not completion queue
request.processingQueue.async {
// Question HERE:
procClosure(data, DispatchQueue.main, complClosure)
// is such passing of the closure safe? Can I have issues with concurrency?
}
}
func requestData2(..., complClosure) {
// the same as requestData1 but gets data from endpoint 2
...
let procClosure = self.processingClosure
request.processingQueue.async {
procClosure(data, DispatchQueue.main, complClosure)
}
}
}
This seems a safe way to pass closure since it doesn't capture any variables. Will I have any concurrency issues with procClosure call?
Is there a better way to encapsulate a common functionality of data transformation to reuse in similar requests to different endpoints (I can encapsulate only data processing but not requesting)?
I have to fetch three types of data (AType, BType, CType) using three separate API requests. The objects returned by the APIs are related by one-to-many:
1 AType object is parent of N BType objects
1 BType object is parent of P CType objects)
I'm using the following three functions to fetch each type:
func get_A_objects() -> Observable<AType> { /* code here */ }
func get_B_objects(a_parentid:Int) -> Observable<BType> { /* code here */}
func get_C_objects(b_parentid:Int) -> Observable<CType> { /* code here */}
and to avoid nested subscriptions, these three functions are chained using flatMap:
func getAll() -> Observable<CType> {
return self.get_A_objects()
.flatMap { (aa:AType) in return get_B_objects(aa.id) }
.flatMap { (bb:BType) in return get_C_objects(bb.id) }
}
func setup() {
self.getAll().subscribeNext { _ in
print ("One more item fetched")
}
}
The above code works fine, when there are M objects of AType, I could see the text "One more item fetched" printed MxNxP times.
I'd like to setup the getAll() function to deliver status updates throughout the chain using ReplaySubject<String>. My initial thought is to write something like:
func getAll() -> ReplaySubject<String> {
let msg = ReplaySubject<String>.createUnbounded()
self.get_A_objects().doOnNext { aobj in msg.onNext ("Fetching A \(aobj)") }
.flatMap { (aa:AType) in
return get_B_objects(aa.id).doOnNext { bobj in msg.onNext ("Fetching B \(bobj)") }
}
.flatMap { (bb:BType) in
return get_C_objects(bb.id).doOnNext { cobj in msg.onNext ("Fetching C \(cobj)") }
}
return msg
}
but this attempt failed, i.e., the following print() does not print anything.
getAll().subscribeNext {
print ($0)
}
How should I rewrite my logic?
Problem
It's because you're not retaining your Disposables, so they're being deallocated immediately, and thus do nothing.
In getAll, you create an Observable<AType> via get_A_objects(), yet it is not added to a DisposeBag. When it goes out of scope (at the end of the func), it will be deallocated. So { aobj in msg.onNext ("Fetching A \(aobj)") } will never happen (or at least isn't likely to, if it's async).
Also, you aren't retaining the ReplaySubject<String> returned from getAll().subscribeNext either. So for the same reason, this would also be a deal-breaker.
Solution
Since you want two Observables: one for the actual final results (Observable<CType>), and one for the progress status (ReplaySubject<String>), you should return both from your getAll() function, so that both can be "owned", and their lifetime managed.
func getAll() -> (Observable<CType>, ReplaySubject<String>) {
let progress = ReplaySubject<String>.createUnbounded()
let results = self.get_A_objects()......
return (results, progress)
}
let (results, progress) = getAll()
progress
.subscribeNext {
print ($0)
}
.addDisposableTo(disposeBag)
results
.subscribeNext {
print ($0)
}
.addDisposableTo(disposeBag)
Some notes:
You shouldn't need to use createUnbounded, which could be dangerous if you aren't careful.
You probably don't really want to use ReplaySubject at all, since it would be a lie to say that you're "fetching" something later if someone subscribes after, and gets an old progress status message. Consider using PublishSubject.
If you follow the above recommendation, then you just need to make sure that you subscribe to progress before results to be sure that you don't miss any progress status messages, since the output won't be buffered anymore.
Also, just my opinion, but I would re-word "Fetching X Y" to something else, since you aren't "fetching", but you have already "fetched" it.