I have a class Complicated, where (it's not a real code):
class BeReadyInSomeTime {
var someData: SomeData
var whenDone: () -> Void
var isDone: Bool = false
var highRes: [LongCountedStuff] = []
init(data:SomeData, whenDone: #escaping () - >Void) {
self.someData = someData
self.whenDone = whenDone
... prepare `highRes` in background...
{ makeHighRes() }
... and when done set `isDone` to `true`, fire `whenDone()`
}
func reset(data:SomeData) {
self.someData = someData
self.isDone = false
self.highRes = []
... forget **immediately** about job from init or reset, start again
{ makeHighRes() }
... and when done set `isDone` to `true`, fire `whenDone()`
}
var highResolution:AnotherType {
if isDone {
return AnotherType(from: highRes)
} else {
return AnotherType(from: someData)
}
}
func makeHighRes() {
var result = [LongCountedStuff]
// prepare data, fast
let some intermediateResult = almost ()
self.highRes = result
}
func almost() -> [LongCountedStuff] {
if isNice {
return countStuff(self.someData)
} else {
return []
}
func countStuff(stuff:[LongCountedStuff], deep:Int = 0) -> [LongCountedSuff] {
if deep == deep enough {
return stuff
} else {
let newStuff = stuff.work
count(newStuff, deep: deep+1)
}
}
Making highRes array is a recurrent function which calls itself many times and sometimes it takes seconds, but I need feedback as fast as possible (and it will be one of someData elements, so I'm safe). As far I know, I can only 'flag' DispatchWorkItem that's cancelled. If I deliver new data by reset few times per second (form mouse drag) whole block is counted in background as many times as data was delivered. How to deal with this kind of problem? To really break counting highRes?
If you have a routine that is constantly calling another framework and you want to stop it at the end of one iteration and before it starts the next iteration, then wrapping this in an Operation and checking isCancelled is a good pattern. (You can also use GCD and DispatchWorkItem and use its isCancelled, too, but I find operations do this more elegantly.)
But if you’re saying you not only want to cancel your loop, but also hope to stop the consuming call within that framework, then, no, you can’t do that (unless the framework provides some cancelation mechanism of its own). But there is no preemptive cancellation. You can’t just stop a time consuming calculation unless you add checks inside that calculation to check to see if it has been canceled.
I’d also ask whether the recursive pattern is right here. Do you really need the results of one calculation in order to start the next? If so, then a recursive (or iterative) pattern is fine. But if the recursive operation is just to pass the next unit of work, then a non-recursive pattern might be better, because it opens up the possibility of doing calculations in parallel.
For example, you might create a concurrent queue with a maxConcurrencyCount of some reasonable value (e.g. 4 or 6). Then wrap each individual processing task in its own Operation subclass and have each check its respective isCancelled. Then you can just add all the operations up front, and let the queue handle it from there. And when you want to stop them, you can tell the queue to cancelAllOperations. It’s a relative simple pattern, allows you to do calculations in parallel, and is cancelable. But this obviously only works if a given operations is not strictly dependent upon the results of the prior operation(s).
Related
I have a value type for which a relatively long calculation is require to produce it (> 1s). I wrap this value type in an enumeration that expresses whether it is currently being calculated or is available:
enum Calculatable<T> {
case calculated(T), calculating
}
The problem is this value is persisted. This means after the long running calculation, when I have the updated value, I try to persist it first before changing the property visible to the business logic. If persistence succeeds, the property is updated, if it fails, I want to wait and then try to persist again; with a catch — if any other values have changed that the calculated one depends on, we throw away the previously calculated value — stop trying to persist it — and submit a new Calculation operation to the queue — starting all over again. My initial attempt at that looked something like this:
/// Wraps a value that takes a long time to be calculated.
public class CalculatedValueWrapper<T> {
/// The last submitted `Calculation` to the queue.
private weak var latestCalculation: Optional<Operation>
/// The long running calculation that produces the up to date, wrapped value.
private let longCalculation: (_ cancelIf: () -> Bool) -> T
/// Calculation queue.
private let queue: OperationQueue
/// The calculated value.
private var wrappedValue: Calculatable<T>
/// Update the wrapped value.
public func update() {
// Don't set if already being calculated.
if case .calculated(_)=self.wrappedValue { self.wrappedValue = .calculating }
// Initiate new calculation.
self.latestCalculation?.cancel()
self.latestCalculation={
let calc: Calculation = .init(self, self.longCalculation)
self.queue.addOperation(calc)
return calc
}()
}
}
/// Executes a long running calculation and persists the result once complete.
class Calculation<T>: Operation {
/// The calculation.
private let calculation: (_ cancelIf: () -> Bool) -> T
/// The owner of the wrapped value we're calculating.
private weak var owner: Optional<CalculatedValueWrapper<T>>
func main() {
let result=self.calculation(cancelIf: { [unowned self] in self.isCancelled })
let persist: () -> Bool={
// In case the persistence fails.
var didPersist=false
// Persist the result on the main thread.
DispatchQueue.main.sync { [unowned self] in
// The owner may have been deallocated between the time this dispatch item was submitted and the time it began executing.
guard let owner=self.owner else { self.cancel(); return }
// May have been cancelled.
guard !self.isCancelled else { return }
// Attempt to persist the calculated result.
if let _=try? owner.persist(result) {
didPersist=true
}
}
// Done.
return didPersist
}
// Persist the new result. If it fails, and we're not cancelled, keep trying until it succeeds.
while !self.isCancelled && !self.persist() {
usleep(500_000)
}
}
}
I would have stuck with this, but after further research I noticed a prevailing sentiment that it was bad practice to sleep the thread in DispatchQueue items and Operations. The alternative that seems to be considered better, in the case of DispatchQueue for example, was to initiate another DispatchQueue item using asyncAfter(deadline:execute:).
That solution appears more complicated in my case as being able to cancel a single operation that encapsulates everything that needs to be done makes cancelling easy. I can hold a reference to the last Calculation operation executed, and if another one is submitted on the main thread while one is in progress, I cancel the old one, add the new one, and with that I know the old value won't be persisted because it is done on the main thread after cancellation has already been performed; and a Calculation must not be cancelled in order for it to persist its result.
Is there something about DispatchQueues or OperationQueues that would make the alternative solution more straightforward to implement in this case? Or is sleeping here totally fine?
Make a RetryPersist NSOperation. Just before adding it to the queue, set its isReady to false. Have it do a dispatchAfter weakly capturing itself to set its isReady to true. Now you can cancel any RetryPersist operations in the queue, if the count of them where not ready, not cancelled is not zero you know there’s one waiting to go etc.
I was looking at this answer that provides code for a thread safe array with concurrent reads. As #tombardey points out in the comments the code (relevant snippet below) is not completely safe:
public func removeAtIndex(index: Int) {
self.accessQueue.async(flags:.barrier) {
self.array.remove(at: index)
}
}
public var count: Int {
var count = 0
self.accessQueue.sync {
count = self.array.count
}
return count
}
...Say the sychronized array has
one element, wouldn't this fail? if synchronizedArray.count == 1 {
synchronizedArray.remove(at: 0) } It's a race condition, say two
threads execute the statement. Both read a count of 1 concurrently,
both enqueue a write block concurrently. The write blocks execute
sequentially, the second one will fail... (cont.)
#Rob replies:
#tombardey - You are absolutely right that this level of
synchronization (at the property/method level) is frequently
insufficient to achieve true thread-safety in broader applications.
Your example is easily solved (by adding an method that dispatches
block to the queue), but there are others that aren't (e.g.
"synchronized" array simultaneously used by a UITableViewDataSource
and mutated by some background operation). In those cases, you have to
implement your own higher-level synchronization. But the above
technique is nonetheless very useful in certain, highly constrained
situations.
I am struggling to work out what #Rob means by "Your example is easily solved (by adding an method that dispatches block to the queue)". I would be interested to see an example implementation of this method (or any other) technique to solve the problem.
This is a very good example of why "atomic" mutable operations on individual properties are rarely sufficient, and are dangerous to add without a great deal of care.
The fundamental problem in this example is that any time the array is modified, it invalidates existing indices. In order to safely use an index, you must ensure that the entire "fetch an index, use the index" operation is atomic. You can't just ensure that each piece is atomic. There is no safe way to write removeAtIndex in isolation, because there is no safe way to acquire the index you pass. Between the time you fetch the index, and the time you use it, the array may have been changed in arbitrary ways.
The point is that there's no such thing as a "thread-safe (mutable) array" that you can use just like a normal array and not have to worry about concurrency issues. A "thread-safe" mutable array cannot return or accept indices, because its indices aren't stable. Exactly what data structure is appropriate depends on the problem you're trying to solve. There's no one answer here.
In most cases the answer is "less concurrency." Rather than trying to manage concurrent access to individual data structures, think about larger-scoped "units of work" that carry all their own data and have exclusive access to it. Put those larger units of work onto queues. (In many cases, even this is overkill. You'd be shocked how often adding currency makes things slower if you don't design it very carefully.) For more recommendations, see Modernizing Grand Central Dispatch Usage.
You said:
I am struggling to work out what #Rob means by “Your example is easily solved (by adding [a] method that dispatches block to the queue)”. I would be interested to see an example implementation of this method (or any other) technique to solve the problem.
Let’s expand upon the example that I posted in response to your other question (see point 3 in this answer), adding a few more Array methods:
class SynchronizedArray<T> {
private var array: [T]
private let accessQueue = DispatchQueue(label: "com.domain.app.reader-writer", attributes: .concurrent)
init(_ array: [T] = []) {
self.array = array
}
subscript(index: Int) -> T {
get { reader { $0[index] } }
set { writer { $0[index] = newValue } }
}
var count: Int {
reader { $0.count }
}
func append(newElement: T) {
writer { $0.append(newElement) }
}
func remove(at index: Int) {
writer { $0.remove(at: index) }
}
func reader<U>(_ block: ([T]) throws -> U) rethrows -> U {
try accessQueue.sync { try block(array) }
}
func writer(_ block: #escaping (inout [T]) -> Void) {
accessQueue.async(flags: .barrier) { block(&self.array) }
}
}
So, let’s imagine that you wanted to delete an item if there was only one item in the array. Consider:
let numbers = SynchronizedArray([42])
...
if numbers.count == 1 {
numbers.remove(at: 0)
}
That looks innocent enough, but it is not thread-safe. You could have a race condition if other threads are inserting or removing values. E.g., if some other thread appended a value between the time you tested the count and when you went to remove the value.
You can fix that by wrapping the whole operation (the if test and the consequent removal) in a single block that is synchronized. Thus you could:
numbers.writer { array in
if array.count == 1 {
array.remove(at: 0)
}
}
This writer method (in this reader-writer-based synchronization) is an example of what I meant by a “method that dispatches block to the queue”.
Now, clearly, you could also give your SynchronizedArray its own method that did this for you, e.g.:
func safelyRemove(at index: Int) {
writer { array in
if index < array.count {
array.remove(at: index)
}
}
}
Then you can do:
numbers.safelyRemove(at: index)
... and that is thread-safe, but still enjoys the performance benefits of reader-writer synchronization.
But the general idea is that when dealing with a thread-safe collection, you invariably have a series of tasks that you will want to synchronize together, at a higher level of abstraction. By exposing the synchronization methods of reader and writer, you have a simple, generalized mechanism for doing that.
All of that having been said, as others have said, the best way to write thread-safe code is to avoid concurrent access altogether. But if you must make a mutable object thread-safe, then it is the responsibility of the caller to identify the series of tasks that must be performed as a single, synchronized operation.
I see several problems with the posted code and example:
Function removeAtIndex is not checking whether it can actually remove at provided index. So it should be changed to
public func removeAtIndex(index: Int) {
// Check if it even makes sense to schedule an update
// This is optional, but IMO just a better practice
guard count > index else { return }
self.accessQueue.async(flags: .barrier) {
// Check again before removing to make sure array didn't change
// Here we can actually check the size of the array, since other threads are blocked
guard self.array.count > index else { return }
self.array.remove(at: index)
}
}
Usage of a thread-safe class also implies that you use one operation to both check and do operation on an item that is supposed to be thread-safe. So if you checking array size and then removing it, you are breaking that thread-safety envelope, it's not a correct use of the class. The particular case synchronizedArray.count == 1 { synchronizedArray.remove(at: 0) } is resolved with adjustments to function above (you don't need to check count anymore, as function already does that). But if you still needed a function that both, verifies count, and then removes an item, you would have to create a function in your thread-safe class that does both operations with no possibility that other threads modify an array in between. You may even need 2 functions: synchronizedArray.getCountAndRemove (get count, then remove), and synchronizedArray.removeAndGetCount` (remove, then get count).
public func getCountAndRemoveAtIndex(index: Int) -> Int {
var currentCount = count
guard currentCount > index else { return currentCount }
// Has to run synchronously to ensure the value is returned
self.accessQueue.sync {
currentCount = self.array.count
guard currentCount > index else { break }
self.array.remove(at: index)
}
return currentCount
}
In general removing item at index for an array that is used from multiple threads, is quite meaningless. You can't even be sure what you are removing. Maybe there are some cases where it would make sense, but usually it makes more sense to either remove by some logic (e.g. specific value), or have a function that returns the value of the item it removes (e.g. func getAndRemoveAtIndex(index: Int) -> T)
Always test every function and combination of them. For example if original poster tested removal like this:
let array = SynchronizedArray<Int>()
array.append(newElement: 1)
array.append(newElement: 2)
array.append(newElement: 3)
DispatchQueue.concurrentPerform(iterations: 5) {_ in
array.removeAtIndex(index: 0)
}
He would get a Fatal error: Index out of range: file Swift/Array.swift, line 1298 in 2 out of 5 threads, so it would be clear that original implementation of this function is not right. Try the same test with the function I posted above, and you will see the difference.
BTW we are only talking about removeAtIndex, but subscript has a similar problem as well. But interestingly first() is implemented correctly.
I've created a Combine publisher chain that looks something like this:
let pub = getSomeAsyncData()
.mapError { ... }
.map { ... }
...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
return wsi.subject
}
.share().eraseToAnyPublisher()
It's a flow of different possible network requests and data transformations. The calling code wants to subscribe to pub to find out when the whole asynchronous process has succeeded or failed.
I'm confused about the design of the flatMap step with the WebSocketInteraction. That's a helper class that I wrote. I don't think its internal details are important, but its purpose is to provide its subject property (a PassthroughSubject) as the next Publisher in the chain. Internally the WebSocketInteraction uses URLSessionWebSocketTask, talks to a server, and publishes to the subject. I like flatMap, but how do you keep this piece alive for the lifetime of the Publisher chain?
If I store it in the outer object (no problem), then I need to clean it up. I could do that when the subject completes, but if the caller cancels the entire publisher chain then I won't receive a completion event. Do I need to use Publisher.handleEvents and listen for cancellation as well? This seems a bit ugly. But maybe there is no other way...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
self.currentWsi = wsi // store in containing object to keep it alive.
wsi.subject.sink(receiveCompletion: { self.currentWsi = nil })
wsi.subject.handleEvents(receiveCancel: {
wsi.closeWebSocket()
self.currentWsi = nil
})
Anyone have any good "design patterns" here?
One design I've considered is making my own Publisher. For example, instead of having WebSocketInteraction vend a PassthroughSubject, it could conform to Publisher. I may end up going this way, but making a custom Combine Publisher is more work, and the documentation steers people toward using a subject instead. To make a custom Publisher you have to implement some of things that the PassthroughSubject does for you, like respond to demand and cancellation, and keep state to ensure you complete at most once and don't send events after that.
[Edit: to clarify that WebSocketInteraction is my own class.]
It's not exactly clear what problems you are facing with keeping an inner object alive. The object should be alive so long as something has a strong reference to it.
It's either an external object that will start some async process, or an internal closure that keeps a strong reference to self via self.subject.send(...).
class WebSocketInteraction {
private let subject = PassthroughSubject<String, Error>()
private var isCancelled: Bool = false
init() {
// start some async work
DispatchQueue.main.asyncAfter(deadline: .now() + 1) {
if !isCancelled { self.subject.send("Done") } // <-- ref
}
}
// return a publisher that can cancel the operation when
var pub: AnyPublisher<String, Error> {
subject
.handleEvents(receiveCancel: {
print("cancel handler")
self.isCancelled = true // <-- ref
})
.eraseToAnyPublisher()
}
}
You should be able to use it as you wanted with flatMap, since the pub property returned publisher, and the inner closure hold a reference to self
let pub = getSomeAsyncData()
...
.flatMap { data in
let wsi = WebSocketInteraction(data, ...)
return wsi.pub
}
Problem is how to wait for an async query on HealthKit to return a result BEFORE allowing execution to move on. The returned data is critical for further execution.
I know this has been asked/solved many times and I have read many of the posts, however I have tried completion handlers, Dispatch sync and Dispatch Groups and have not been able to come up with an implementation that works.
Using completion handler
per Wait for completion handler to finish - Swift
This calls a method to run a HealthKit Query:
func readHK() {
var block: Bool = false
hk.findLastBloodGlucoseInHealthKit(completion: { (result) -> Void in
block = true
if !(result) {
print("Problem with HK data")
}
else {
print ("Got HK data OK")
}
})
while !(block) {
}
// now move on to the next thing ...
}
This does work. Using "block" variable to hold execution pending the callback in concept seems not that different from blocking semaphores, but it's really ugly and asking for trouble if the completion doesn't return for whatever reason. Is there a better way?
Using Dispatch Groups
If I put Dispatch Group at the calling function level:
Calling function:
func readHK() {
var block: Bool = false
dispatchGroup.enter()
hk.findLastBloodGlucoseInHealthKit(dg: dispatchGroup)
print ("Back from readHK")
dispatchGroup.notify(queue: .main) {
print("Function complete")
block = true
}
while !(block){
}
}
Receiving function:
func findLastBloodGlucoseInHealthKit(dg: DispatchGroup) {
print ("Read last HK glucose")
let sortDescriptor = NSSortDescriptor(key: HKSampleSortIdentifierEndDate, ascending: false)
let query = HKSampleQuery(sampleType: glucoseQuantity!, predicate: nil, limit: 10, sortDescriptors: [sortDescriptor]) { (query, results, error) in
// .... other stuff
dg.leave()
The completion executes OK, but the .notify method is never called, so the block variable is never updated, program hangs and never exits from the while statement.
Put Dispatch Group in target function but leave .notify at calling level:
func readHK() {
var done: Bool = false
hk.findLastBloodGlucoseInHealthKit()
print ("Back from readHK")
hk.dispatchGroup.notify(queue: .main) {
print("done function")
done = true
}
while !(done) {
}
}
Same issue.
Using Dispatch
Documentation and other S.O posts say: “If you want to wait for the block to complete use the sync() method instead.”
But what does “complete” mean? It seems that it does not mean complete the function AND get the later async completion. For example, the below does not hold execution until the completion returns:
func readHK() {
DispatchQueue.global(qos: .background).sync {
hk.findLastBloodGlucoseInHealthKit()
}
print ("Back from readHK")
}
Thank you for any help.
Yes, please don't fight the async nature of things. You will almost always lose, either by making an inefficient app (timers and other delays) or by creating opportunities for hard-to-diagnose bugs by implementing your own blocking functions.
I am far from a Swift/iOS expert, but it appears that your best alternatives are to use Grand Central Dispatch, or one of the third-party libraries for managing async work. Look at PromiseKit, for example, although I haven't seen as nice a Swift Promises/Futures library as JavaScript's bluebird.
You can use DispatchGroup to keep track of the completion handler for queries. Call the "enter" method when you set up the query, and the "leave" at the end of the results handler, not after the query has been set up or executed. Make sure that you exit even if the query is completed with an error. I am not sure why you are having trouble because this works fine in my app. The trick, I think, is to make sure you always "leave()" the dispatch group no matter what goes wrong.
If you prefer, you can set a barrier task in the DispatchQueue -- this will only execute when all of the earlier tasks in the queue have completed -- instead of using a DispatchGroup. You do this by adding the correct options to the DispatchWorkItem.
I'm looping through a table's rows, for each of them I'm doing a couple of async calls like fetching data from API, copying files, running shell script... How do I wait for the result until going to the next one.
Also I'm new to Swift, not sure if this is the best way to handle a group of async tasks. Should I use concurrency in this case ?
tableView.selectedRowIndexes.forEach { row in
myData.fetch(url: urlList[row]) { res in
self.anotherAsyncCall(res) { data in
//continue to deal with next row now
}
}
}
If you really want to do this sequentially, the easiest way is to perform your tasks recursively, actually invoking the next task in the completion handler of the prior one:
processNext(in: tableView.selectedRowIndexes) {
// do something when they're all done
}
Where:
func processNext(in rows: [Int], completion: #escaping () -> Void) {
guard let row = rows.first else {
completion()
return
}
myData.fetch(url: urlList[row]) { res in
self.anotherAsyncCall(res) { data in
//continue to deal with next row now
self.processNext(in: Array(rows.dropFirst()), completion: completion)
}
}
}
But I agree with GoodSp33d that the other approach is to wrap this asynchronous process in a custom, asynchronous, Operation subclass.
But this begs the question why you want to do these sequentially. You will pay a significant performance penalty because of the inherent network latency for each request. So the alternative is to let them run concurrently, and use dispatch group to know when they're done:
let group = DispatchGroup()
tableView.selectedRowIndexes.forEach { row in
group.enter()
myData.fetch(url: urlList[row]) { res in
self.anotherAsyncCall(res) { data in
//continue to deal with next row now
group.leave()
}
}
}
group.notify(queue: .main) {
// do something when they're all done
}
Whether you can run these concurrently (or to what degree) is a function of what you're doing inside various asynchronous methods. But I would suggest you think hard about making this work concurrently, as the performance is likely to be much better.
If you are using some promise library, just use the all function.
Here is some Document about promise.all()
And PromiseKit use when instead,
you can read about the faq and the tutorial about when for more information.
If you want to do that without any promise library, here is the pseudocode:
var results = []
rows.forEach {row in
fetch(row) {res in
results.push(res)
if(results.length == rows.length) {
// do something using the results here
}
}
}