I am trying to do some calculations on a large number of objects. The objects are saved in an array and the results of the operation should be saved in a new array. To speed up the processing, I‘m trying to break up the task into multiple subtasks which can run concurrently on different threads. The simplified example code below replaces the actual operation with two seconds of wait.
I have tried multiple ways of solving this issue, using both DispatchQueues and Tasks.
Using DispatchQueue
The basic setup I used is the following:
import Foundation
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
let myQueue = DispatchQueue(label: "Calculator", attributes: .concurrent)
var allPartialResults = [Set<String>]()
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
allPartialResults.append(Set<String>())
myQueue.async {
allPartialResults[i] = self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
myQueue.sync(flags: .barrier) {
for result in allPartialResults {
self.calculatedData.formUnion(result)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) -> Set<String> {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
As expected, the Console Log is the following (with all three "ended" appearing at once, two seconds after all three "began" appeared at once):
began
began
began
ended
ended
ended
When measuring performance using os_signpost (and using real data and calculations), this approach reduces the time needed for the entire doCalculation() function to run from 40ms to around 14ms.
Note that to avoid data races when appending the results to the final calculatedData Set, I created an array of partial Data sets of which every DispatchQueue only accesses one index (which is not a solution I like and the main reason why I am not satisfied with this approach). What I would have liked to do is to call DispatchQueue.main from within myQueue and add the new data to the calculatedData Set on the main thread, however calling DispatchQueue.main.sync causes a deadlock and using the async version leads to the barrier flag not working as intended.
Using Tasks
In a second attempt, I tried using Tasks to run code concurrently. As I understand it, there are two options for running code concurrently with Tasks. async let and withTaskGroup. For the purpose of retrieving a variable quantity of partial results form a variable amount of concurrent tasks, I figured using withTaskGroup was the best option for me.
I modified the code to look like this:
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() async {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
await withTaskGroup(of: Set<String>.self) { group in
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
group.addTask {
return await self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
for await newSet in group {
calculatedData.formUnion(newSet)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) async -> Set<String> {
print("began")
try? await Task.sleep(nanoseconds: UInt64(1e9))
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
However, the Console Log prints the following (with every "ended" coming 2 seconds after the preceding "before"):
began
ended
began
ended
began
ended
Measuring performance using os_signpost revealed that the operation takes 40ms to complete. Therefore it is not running concurrently.
With that being said, what is the best course of action for this problem?
Using DispatchQueue, how do you call the Main Queue to avoid data races from within a queue, while at the same time preserving a barrier flag later on in the code?
Using Task, how do can you actually make them run concurrently?
EDIT
Running the code on a real device instead of the simulator and changing the sleep function inside the Task from sleep() to Task.sleep(), I was able to achieve concurrent behavior in that the Console prints the expected log. However, the operation time for the task remains upwards of 40-50ms and is highly variable, sometimes reaching 200ms or more. This problem remains after adding the .userInitiated property to the Task.
Why does it take so much longer to run the same operation concurrently using Task compared to using DispatchQueue? Am I missing something?
A few observations:
One possible performance difference is that the simulator artificially constrains the “cooperative thread pool” used by async-await. See Maximum number of threads with async-await task groups. This is one cause of a lack of full concurrency (on the simulator).
In the async-await test, another factor that can affect concurrency is an actor. If an actor is enforcing serial execution, then consider declaring doPartialCalculation as nonisolated, so that it allows concurrent execution. Failure to do so can prevent any concurrent execution (with your sleep scenario, for example).
The fact that you saw a significant performance difference when you went from sleep to Task.sleep makes me wonder if might have done this within an actor. Actors are “reentrant” and Task.sleep suspends execution and lets the actor to switch to another task. So it allows concurrency for a series of async methods.
But Task.sleep is not analogous to some computationally intensive task that will tie up the thread. But by declaring the function as nonisolated, that will achieve concurrent execution for computationally intensive processes. That can achieve performance results that are nearly equivalent to what you achieved with a GCD implementation.
That having being said, you might still find that async-await is a tiny bit slower than pure GCD implementations. Then again, Swift concurrency offers more native protections and compile-time warnings to ensure thread-safety.
E.g., here are 100 compute-heavy tasks in both GCD and async-await, performed twice for each:
So, you simply have to ask yourself whether the benefits of async-await warrant the modest performance impact or not.
A few unrelated asides on the GCD implementation:
It should be noted that your GCD example is not thread-safe and so the comparison of your two code snippets is not entirely fair. You should make the GCD implementation thread-safe. (Perhaps consider temporarily testing with TSAN. See “Detect Data Races Among Your App’s Threads” section of Diagnosing Memory, Thread, and Crash Issues Early.) You should perform doPartialCalculation in parallel, but you must synchronize the update of allPartialResults (or any shared resource). You can use GCD serial queue for this. Or since you seem to be so concerned about performance, perhaps a NSLock or os_unfair_lock (though care must be taken with the latter). See the GCD example at the end of this answer.
If your dispatched blocks are taking ~50 msec, that simply might not be enough work to justify the overhead of concurrency. You may even find that a simple, serial, rendition is faster!
Often, to maximize the amount of work done per thread, we would “stride” through our index (which is what you appear to be doing with your “slice” logic). But if, even after striding, the time per concurrent loop is still measured in milliseconds, then it may turn out that concurrency is unwarranted altogether. Some tasks are so trivial that they simply will not benefit from concurrent execution.
In your GCD example, you are dispatching to a concurrent queue, which if you have too many iterations, can lead to “thread explosion”, exhausting a very limited worker thread pool. You are only doing three iterations, so that’s not a problem now, but if the number of iterations grows, you would want to abandon that pattern, and adopt concurrentPerform (as seen here). It’s a great way to make full use of the hardware capabilities while avoiding the exhausting of the worker thread pool.
As an aside, I would be wary of using any of the sleep methods as a proxy for a time consuming task. You actually want to keep the CPU busy. I personally use an inefficient π calculation as my general proxy for “do something slow”. That is what I used above.
func performHeavyTask(iteration: Int) {
let id = OSSignpostID(log: poi)
os_signpost(.begin, log: poi, name: #function, signpostID: id, "%d", iteration)
let pi = calculatePi(iterations: 100_000_000)
os_signpost(.end, log: poi, name: #function, signpostID: id, "%f", pi)
}
// calculate pi using Gregory-Leibniz series
func calculatePi(iterations: Int) -> Double {
var result = 0.0
var sign = 1.0
for i in 0 ..< iterations {
result += sign / Double(i * 2 + 1)
sign *= -1
}
return result * 4
}
E.g. here is a GCD example which
uses concurrentPerform;
performs calculation in parallel but synchronizes array updates;
performs update of model on main thread;
uses Sequence<String> rather than [String] to eliminate expensive array creation:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.sync { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
DispatchQueue.main.async { // update on main thread
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
// note, rather than taking `[String]`, which requires us to create a new
// `Array` instance, let's change this to take `Sequence<String>` as
// input ... that way we can supply array slices directly
func doPartialCalculation<S>(with data: S) -> Set<String> where S: Sequence, S.Element == String {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
Or, alternatively, you could do the updates of the local var asynchronously and keep track of them with a DispatchGroup, performing the final update (or call to the completion handler) on the .main queue:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
let group = DispatchGroup()
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.async(group: group) { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
group.notify(queue: .main) {
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
You can benchmark this and see whether the asynchronous update has any material impact. It probably will not in this case, but the proof is in the pudding.
Your Task-based example looks like it should execute concurrently. I ran it and am able to get concurrent execution.
Probably the issue you're having is that Swift concurrency tries to limit Task concurrency to the number of available cores. And (I don't think this is well documented!) Swift playgrounds and the iOS simulators seem to execute in a single-core environment.
So if you run your code in a Swift playground, you'll get serial task execution. If you make a Mac app and run it in that, or on an iOS device, you should get parallel execution.
This WWDC talk from last year has a discussion of why it works that way: https://developer.apple.com/videos/play/wwdc2021/10254/?time=652
That's worth paying attention to. You'll of course be fine scheduling 3 blocks on a concurrent queue, but if your example is standing in for a real workload that might have hundreds or thousands, it's easy to cause thread explosion and create new, harder to understand performance issues.
I have a function used to create an object graph in my app for testing purposes. The data structure is very simple at present with a one-to-many relationship between Patient and ParameterMeasurement entities.
As setup of the test state involves around 800 entries it makes sense to do this as a batch insert which works...until you try and establish the relationship between ParameterMeasurement and Patient (which, in the reciprocal, is a one-to-one) at which point the app crashes with the dreaded "Illegal attempt to establish a relationship 'cdPatient' between objects in different contexts"
I'm struggling to understand why this is happening as both Patient and ParameterMeasurement entities are created using the same managed object context which is passed to the function by the caller.
I've already tried to store the objectID of the Patient (created before instantiating ParameterMeasurement instances) and then creating a local copy of the Patient instance inside the batch insert closure (code in place below but commented out) but this does not resolve the issue. I've also checked my model (all OK, relationships are good), deleted the app and reset the sim but still no joy.
Finally, I've stuck in print statements to check the MOCs associated with both entities at the point of instantiation and the MOC passed to the function. As expected, the memory addresses match which makes it look like the error message is a red herring.
Can anyone point me in the right direction? This seems to have been a common issue in the past (lots of posts 5y+ ago with ObjC but little in Swift) but the examples on don't deal with this specific scenario.
func addSampleData(to context: NSManagedObjectContext) throws {
try addParameterDefinitions(to: context, resetToDefaults: true)
let fetchRequest = ParameterProfile.fetchAll
let profiles = try context.fetch(fetchRequest)
for _ in 1...10 {
let patient = Patient(context: context)
patient.cdName = "Patient \(UUID().uuidString.split(separator: "-")[0])"
patient.cdCreationDate = Date()
// let patientID = patient.objectID
for profile in profiles {
let data: [(Date, Double)] = DataGenerator.placeholderDataForParameter(with: profile)
var idx = 0
let total = data.count
let batchInsert = NSBatchInsertRequest(entity: ParameterMeasurement.entity()) { (managedObject: NSManagedObject) -> Bool in
guard idx < total else { return true }
// let patientInContext = context.object(with: patientID) as! Patient
if let measurement = managedObject as? ParameterMeasurement {
// measurement.cdPatient = patientInContext
measurement.cdPatient = patient
measurement.cdName = profile.cdName
measurement.cdTimestamp = data[idx].0
measurement.cdValue = data[idx].1
}
idx += 1
return false
}
do {
try context.execute(batchInsert)
try context.save()
} catch {
fatalError("Import failed with error: \(error.localizedDescription)")
}
}
}
}
Core Data model definitions:
Having done some more digging on this, it appears that batch inserts cannot be used to add relationships to the persistent store as noted here. I'm guessing its because of the difficulties associated with correctly associating entities during the process - frustrating but not a deal breaker.
For now, I'll revert to individual insertion of entities although I could do the process in 2 passes, i.e. a batch insert of the "basic" properties and a second pass setting the relationships on the inserted entities. It seems like a bit too much effort at this level though and any time saving is likely to be minimal for the extra code complexity (and risk of bugs!)
The following example is a minimal example that I found that explains the problem I am having:
use std::borrow::BorrowMut;
use std::ops::DerefMut;
#[derive(Debug, Clone)]
enum ConnectionState {
NotStarted,
}
type StateChangedCallback = Box<FnMut(ConnectionState) + Send + Sync>;
fn thread_func(mut on_state_changed: StateChangedCallback) {
let new_state = ConnectionState::NotStarted;
let f: &mut BorrowMut<StateChangedCallback> = &mut on_state_changed;
f.borrow_mut().deref_mut()(new_state);
}
fn main() {
let on_state_changed = Box::new(|new_state| {
println!("New state: {:?}", new_state);
});
let join_handle = std::thread::spawn(|| thread_func(on_state_changed));
join_handle.join().unwrap();
}
I have a simple thread that needs to call a callback passed from main. The callback is the signature Box<FnMut(ConnectionState) + Send + Sync>, since I want to call it multiple times. The only way I managed to call the callback was with this weird syntax:
let f: &mut BorrowMut<StateChangedCallback> = &mut on_state_changed;
f.borrow_mut().deref_mut()(new_state);
I searched and did not find a reasonable explanation for this. I am doing something wrong? Or is this the way Rust works?
If it is so, could someone explain the reason for this syntax?
You are overcomplicating things.
You might explain, why you think, that you have to do borrow_mut(), since there is no borrowing involved in your signature.
Your function thread_func can be simplified to this:
fn thread_func(mut on_state_changed: StateChangedCallback) {
let new_state = ConnectionState::NotStarted;
on_state_changed(new_state);
}
Please note, that in contrast to your sentence "I want to call it (the callback) multiple times" you can't, because you move your closure into the function.
I tried to make a serial queue for network operations with GCD like this:
let mySerialQueue = dispatch_queue_create("com.myApp.mySerialQueue", dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_USER_INITIATED, 0))
func myFunc() {
dispatch_async(mySerialQueue) {
do {
// Get object from the database if it exists
let query = PFQuery(className: aClass)
query.whereKey(user, equalTo: currentUser)
let result = try? query.getFirstObject()
// Use existing object or create a new one
let object = result ?? PFObject(className: aClass)
object.setObject(currentUser, forKey: user)
try object.save()
} catch {
print(error)
}
}
}
The code first looks for an existing object in the database.
If it finds one, it updates it. If it doesn't find one, it creates a new one. This is using the Parse SDK and only synchronous network functions (.getFirstObject, .save).
For some reason it seems that this is not executed serially, because a new object is sometimes written into the database, although one existed already that should have been updated only.
Am I missing something about the GCD?
From the documentation on dispatch_queue_attr_make_with_qos_class:
relative_priority: A negative offset from the maximum supported scheduler priority for the given quality-of-service class. This value must be less than 0 and greater than MIN_QOS_CLASS_PRIORITY
Therefore you should be passing in a value less than 0 for this.
However, if you have no need for a priority, you can simply pass DISPATCH_QUEUE_SERIAL into the attr argument when you create your queue. For example:
let mySerialQueue = dispatch_queue_create("com.myApp.mySerialQueue", DISPATCH_QUEUE_SERIAL)
I watched with a great attention the WWDC 2015 sessions about Advanced NSOperations and I played a little bit with the example code.
The provided abstraction are really great, but there is something I may did not really good understand.
I would like to pass result data between two consequent Operation subclasses without using a MOC.
Imagine I have a APIQueryOperation which has a NSData? property and a second operation ParseJSONOperation consuming this property. How do I provide this NSData? intance to the second operation ?
I tried something like this :
queryOperation = APIQueryOperation(request: registerAPICall)
parseOperation = ParseJSONOperation(data: queryOperation.responseData)
parseOperation.addDependency(queryOperation)
But when I enter in the execute method of the ParseJSONOperation the instance in not the same as the same as in the initialiser.
What did I do wrong ?
Your issue is that you are constructing your ParseJSONOperation with a nil value. Since you have two operations that rely on this NSData object I would suggest you write a wrapper object to house this data.
To try and be aligned with the WWDC talk lets call this object the APIResultContext:
class APIResultContext {
var data: NSData?
}
now we can pass this object into both the APIQueryOperation and the ParseJSONOperation so that we have a valid object that can store the data transferred from the API.
This would make the constructors for the query:
let context = APIResultContext()
APIQueryOperation(request: registerAPICall, context: context)
ParseJSONOperation(context: context)
Inside your ParseJSONOperation you should be able to access the data assuming the query completes after it sets the data.
Thread Safety
As #CouchDeveloper pointed out, data is not strictly speaking thread safe. For this trivial example since the two operations are dependent we can safely write and read knowing that these accesses wont take place at the same time. However, to round the solution up and make the context thread safe we can add a simple NSLock
class APIResultContext {
var data: NSData? {
set {
lock.lock()
_data = newValue
lock.unlock()
}
get {
lock.lock()
var result = _data
lock.unlock()
return result
}
}
private var _data: NSData?
private let lock = NSLock()
}