Split up Task into multiple concurrent subtasks - swift

I am trying to do some calculations on a large number of objects. The objects are saved in an array and the results of the operation should be saved in a new array. To speed up the processing, I‘m trying to break up the task into multiple subtasks which can run concurrently on different threads. The simplified example code below replaces the actual operation with two seconds of wait.
I have tried multiple ways of solving this issue, using both DispatchQueues and Tasks.
Using DispatchQueue
The basic setup I used is the following:
import Foundation
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
let myQueue = DispatchQueue(label: "Calculator", attributes: .concurrent)
var allPartialResults = [Set<String>]()
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
allPartialResults.append(Set<String>())
myQueue.async {
allPartialResults[i] = self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
myQueue.sync(flags: .barrier) {
for result in allPartialResults {
self.calculatedData.formUnion(result)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) -> Set<String> {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
As expected, the Console Log is the following (with all three "ended" appearing at once, two seconds after all three "began" appeared at once):
began
began
began
ended
ended
ended
When measuring performance using os_signpost (and using real data and calculations), this approach reduces the time needed for the entire doCalculation() function to run from 40ms to around 14ms.
Note that to avoid data races when appending the results to the final calculatedData Set, I created an array of partial Data sets of which every DispatchQueue only accesses one index (which is not a solution I like and the main reason why I am not satisfied with this approach). What I would have liked to do is to call DispatchQueue.main from within myQueue and add the new data to the calculatedData Set on the main thread, however calling DispatchQueue.main.sync causes a deadlock and using the async version leads to the barrier flag not working as intended.
Using Tasks
In a second attempt, I tried using Tasks to run code concurrently. As I understand it, there are two options for running code concurrently with Tasks. async let and withTaskGroup. For the purpose of retrieving a variable quantity of partial results form a variable amount of concurrent tasks, I figured using withTaskGroup was the best option for me.
I modified the code to look like this:
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() async {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
await withTaskGroup(of: Set<String>.self) { group in
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
group.addTask {
return await self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
for await newSet in group {
calculatedData.formUnion(newSet)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) async -> Set<String> {
print("began")
try? await Task.sleep(nanoseconds: UInt64(1e9))
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
However, the Console Log prints the following (with every "ended" coming 2 seconds after the preceding "before"):
began
ended
began
ended
began
ended
Measuring performance using os_signpost revealed that the operation takes 40ms to complete. Therefore it is not running concurrently.
With that being said, what is the best course of action for this problem?
Using DispatchQueue, how do you call the Main Queue to avoid data races from within a queue, while at the same time preserving a barrier flag later on in the code?
Using Task, how do can you actually make them run concurrently?
EDIT
Running the code on a real device instead of the simulator and changing the sleep function inside the Task from sleep() to Task.sleep(), I was able to achieve concurrent behavior in that the Console prints the expected log. However, the operation time for the task remains upwards of 40-50ms and is highly variable, sometimes reaching 200ms or more. This problem remains after adding the .userInitiated property to the Task.
Why does it take so much longer to run the same operation concurrently using Task compared to using DispatchQueue? Am I missing something?

A few observations:
One possible performance difference is that the simulator artificially constrains the “cooperative thread pool” used by async-await. See Maximum number of threads with async-await task groups. This is one cause of a lack of full concurrency (on the simulator).
In the async-await test, another factor that can affect concurrency is an actor. If an actor is enforcing serial execution, then consider declaring doPartialCalculation as nonisolated, so that it allows concurrent execution. Failure to do so can prevent any concurrent execution (with your sleep scenario, for example).
The fact that you saw a significant performance difference when you went from sleep to Task.sleep makes me wonder if might have done this within an actor. Actors are “reentrant” and Task.sleep suspends execution and lets the actor to switch to another task. So it allows concurrency for a series of async methods.
But Task.sleep is not analogous to some computationally intensive task that will tie up the thread. But by declaring the function as nonisolated, that will achieve concurrent execution for computationally intensive processes. That can achieve performance results that are nearly equivalent to what you achieved with a GCD implementation.
That having being said, you might still find that async-await is a tiny bit slower than pure GCD implementations. Then again, Swift concurrency offers more native protections and compile-time warnings to ensure thread-safety.
E.g., here are 100 compute-heavy tasks in both GCD and async-await, performed twice for each:
So, you simply have to ask yourself whether the benefits of async-await warrant the modest performance impact or not.
A few unrelated asides on the GCD implementation:
It should be noted that your GCD example is not thread-safe and so the comparison of your two code snippets is not entirely fair. You should make the GCD implementation thread-safe. (Perhaps consider temporarily testing with TSAN. See “Detect Data Races Among Your App’s Threads” section of Diagnosing Memory, Thread, and Crash Issues Early.) You should perform doPartialCalculation in parallel, but you must synchronize the update of allPartialResults (or any shared resource). You can use GCD serial queue for this. Or since you seem to be so concerned about performance, perhaps a NSLock or os_unfair_lock (though care must be taken with the latter). See the GCD example at the end of this answer.
If your dispatched blocks are taking ~50 msec, that simply might not be enough work to justify the overhead of concurrency. You may even find that a simple, serial, rendition is faster!
Often, to maximize the amount of work done per thread, we would “stride” through our index (which is what you appear to be doing with your “slice” logic). But if, even after striding, the time per concurrent loop is still measured in milliseconds, then it may turn out that concurrency is unwarranted altogether. Some tasks are so trivial that they simply will not benefit from concurrent execution.
In your GCD example, you are dispatching to a concurrent queue, which if you have too many iterations, can lead to “thread explosion”, exhausting a very limited worker thread pool. You are only doing three iterations, so that’s not a problem now, but if the number of iterations grows, you would want to abandon that pattern, and adopt concurrentPerform (as seen here). It’s a great way to make full use of the hardware capabilities while avoiding the exhausting of the worker thread pool.
As an aside, I would be wary of using any of the sleep methods as a proxy for a time consuming task. You actually want to keep the CPU busy. I personally use an inefficient π calculation as my general proxy for “do something slow”. That is what I used above.
func performHeavyTask(iteration: Int) {
let id = OSSignpostID(log: poi)
os_signpost(.begin, log: poi, name: #function, signpostID: id, "%d", iteration)
let pi = calculatePi(iterations: 100_000_000)
os_signpost(.end, log: poi, name: #function, signpostID: id, "%f", pi)
}
// calculate pi using Gregory-Leibniz series
func calculatePi(iterations: Int) -> Double {
var result = 0.0
var sign = 1.0
for i in 0 ..< iterations {
result += sign / Double(i * 2 + 1)
sign *= -1
}
return result * 4
}
E.g. here is a GCD example which
uses concurrentPerform;
performs calculation in parallel but synchronizes array updates;
performs update of model on main thread;
uses Sequence<String> rather than [String] to eliminate expensive array creation:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.sync { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
DispatchQueue.main.async { // update on main thread
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
// note, rather than taking `[String]`, which requires us to create a new
// `Array` instance, let's change this to take `Sequence<String>` as
// input ... that way we can supply array slices directly
func doPartialCalculation<S>(with data: S) -> Set<String> where S: Sequence, S.Element == String {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
Or, alternatively, you could do the updates of the local var asynchronously and keep track of them with a DispatchGroup, performing the final update (or call to the completion handler) on the .main queue:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
let group = DispatchGroup()
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.async(group: group) { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
group.notify(queue: .main) {
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
You can benchmark this and see whether the asynchronous update has any material impact. It probably will not in this case, but the proof is in the pudding.

Your Task-based example looks like it should execute concurrently. I ran it and am able to get concurrent execution.
Probably the issue you're having is that Swift concurrency tries to limit Task concurrency to the number of available cores. And (I don't think this is well documented!) Swift playgrounds and the iOS simulators seem to execute in a single-core environment.
So if you run your code in a Swift playground, you'll get serial task execution. If you make a Mac app and run it in that, or on an iOS device, you should get parallel execution.
This WWDC talk from last year has a discussion of why it works that way: https://developer.apple.com/videos/play/wwdc2021/10254/?time=652
That's worth paying attention to. You'll of course be fine scheduling 3 blocks on a concurrent queue, but if your example is standing in for a real workload that might have hundreds or thousands, it's easy to cause thread explosion and create new, harder to understand performance issues.

Related

Why does DispatchQueue.sync cause Data race?

Based on my printed output in the console window, the work in que2 was only executed after the que1 fully finished its work, so my question is why did I get the Data race warning even though the first block of work in que1 was completely synchronous?
Data race in closure #2 () -> () in BlankSwift at BlankSwift.porsche : BlankSwift.Car
struct Car {
var name: String
}
let que1 = DispatchQueue(label: "que1", qos: .background)
let que2 = DispatchQueue(label: "que2", qos: .userInteractive)
var porsche = Car(name: "Porsche")
for i in 0...100 {
que1.sync {
porsche.name = "porsche1"
print(porsche.name)
porsche.name = "Porsche11"
print(porsche.name)
if i == 100 { print("returned ")}
}
que2.async {
porsche.name = "porsche2"
print(porsche.name)
porsche.name = "Porsche22"
print(porsche.name)
}
}
While que1.sync is indeed called synchronously, que2.async is asynchronous on a different queue, so it schedules its closure and immediately returns, at which point you go to the next iteration of the loop.
There is some latency before the que2 closure begins executing. So for example, the closure for que2.async that was scheduled for iteration 0, is likely to start executing while que1.sync is executing for some later iteration, let's say iteration 10.
Not only that, que2 may well have multiple tasks queued up before the first one begins. It's a serial queue, because you didn't specify the .concurrent attribute, so you don't have to worry about que2 tasks racing on another que2 closure access of porsche.name , but they definitely can race on que1 closure accesses.
As for output ordering, ultimately the output will go to FileHandle.standardOutput to which the OS has a buffer attached, and you don't know what kind of synchronization scheme the OS uses to order writes to that buffer. It may well use it's own call to DispatchQueue.async to ensure that I/O is done in a sensible way, much the way UI updates on AppKit/UIKit have to be done on the main thread.

DispatchQueue: why does serial complete faster than concurrent?

I have a unit test setup to prove that concurrently performing multiple heavy tasks is faster than serial.
Now... before everyone in here loses their minds over the fact that above statement is not always correct because multithreading comes with many uncertainties, let me exlain.
I know from reading the apple documentation that you can not guarantee you get multiple threads when asking for them. The OS (iOS) will assign threads however it sees fit. If the device only has one core for example, it will assign one core and serial will be slightly faster due to initialisation code of concurrent operation taking some extra time whilst not delivering a performance improvement because the device has only one core.
However: This difference should only be slight. But in my POC setup the difference is massive. In my POC, concurrent is slower by about 1/3 of the time.
If serial completes in 6 seconds, concurrent will complete in 9 seconds.
This trend continues even with heavier loads. if serial completes in 125 seconds, concurrent will compete in 215 seconds. This also happens not just once but solid every time.
I wonder if I made a mistake in creating this POC, and if so, how should I prove that concurrently performing multiple heavy tasks is indeed faster than serial?
My POC in swift unit tests:
func performHeavyTask(_ completion: (() -> Void)?) {
var counter = 0
while counter < 50000 {
print(counter)
counter = counter.advanced(by: 1)
}
completion?()
}
// MARK: - Serial
func testSerial () {
let start = DispatchTime.now()
let _ = DispatchQueue.global(qos: .userInitiated)
let mainDPG = DispatchGroup()
mainDPG.enter()
DispatchQueue.global(qos: .userInitiated).async {[weak self] in
guard let self = self else { return }
for _ in 0...10 {
self.performHeavyTask(nil)
}
mainDPG.leave()
}
mainDPG.wait()
let end = DispatchTime.now()
let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
print("NanoTime: \(nanoTime / 1_000_000_000)")
}
// MARK: - Concurrent
func testConcurrent() {
let start = DispatchTime.now()
let _ = DispatchQueue.global(qos: .userInitiated)
let mainDPG = DispatchGroup()
mainDPG.enter()
DispatchQueue.global(qos: .userInitiated).async {
let dispatchGroup = DispatchGroup()
let _ = DispatchQueue.global(qos: .userInitiated)
DispatchQueue.concurrentPerform(iterations: 10) { index in
dispatchGroup.enter()
self.performHeavyTask({
dispatchGroup.leave()
})
}
dispatchGroup.wait()
mainDPG.leave()
}
mainDPG.wait()
let end = DispatchTime.now()
let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
print("NanoTime: \(nanoTime / 1_000_000_000)")
}
Details:
OS: macOS High Sierra
Model Name: MacBook Pro
Model Identifier: MacBookPro11,4
Processor Name: Intel Core i7
Processor Speed: 2,2 GHz
Number of Processors: 1
Total Number of Cores: 4
Both tests were done on iPhone XS Max simulator. Both tests were done straight after a reboot of the entire mac was done (to avoid the mac being busy with applications other than running this unit test, blurring results)
Also, both unit tests are wrapped in an async DispatcherWorkItem because the testcase is for the main (UI) queue not to be blocked, preventing the serial testcase to have an advantage on that part as it consumes the main queue instead of a background queue as the concurrent testcase does.
I'll also accept an answer that shows a POC reliably testing this. It does not have to show concurrent is faster than serial all the time (read above explanation as to why not). But at least some time
There are two issues:
I’d avoid doing print inside the loop. That’s synchronized and you’re likely to experience greater performance degradation in concurrent implementation. That’s not the whole story here, but it doesn’t help.
Even after removing the print from within the loop, 50,000 increments of the counter is simply not enough work to see the benefit of concurrentPerform. As Improving on Loop Code says:
... And although this [concurrentPerform] can be a good way to improve performance in loop-based code, you must still use this technique discerningly. Although dispatch queues have very low overhead, there are still costs to scheduling each loop iteration on a thread. Therefore, you should make sure your loop code does enough work to warrant the costs. Exactly how much work you need to do is something you have to measure using the performance tools.
On debug build, I needed to increase number of iterations to values closer to 5,000,000 before this overhead was overcome. And on release build, even that wasn’t sufficient. A spinning loop and incrementing a counter is just too quick to offer meaningful analysis of concurrent behavior.
So, in my example below, I replaced this spinning loop with a more computationally intensive calculation (calculating π using a historic, but not terribly efficient, algorithm).
As an aside:
Rather than measuring the performance yourself, if you do this within a XCTestCase unit test, you can use measure to benchmark performance. This repeats the benchmarking multiple times, captures elapsed time, averages the results, etc. Just make sure to edit your scheme so the test action uses an optimized “release” build rather than a “debug” build.
There’s no point in dispatching this to a global queue if you’re going to use dispatch group to make the calling thread wait for it to complete.
You don’t need to use dispatch groups to wait for concurrentPerform to finish, either. It runs synchronously.
As the concurrentPerform documentation says:
The dispatch queue executes the submitted block the specified number of times and waits for all iterations to complete before returning.
It’s not really material, but it’s worth noting that your for _ in 0...10 { ... } is doing 11 iterations, not 10. You obviously meant to use ..<.
Thus, here is an example, putting it in a unit test, but replacing the “heavy” calculation with something more computationally intensive:
class MyAppTests: XCTestCase {
// calculate pi using Gregory-Leibniz series
func calculatePi(iterations: Int) -> Double {
var result = 0.0
var sign = 1.0
for i in 0 ..< iterations {
result += sign / Double(i * 2 + 1)
sign *= -1
}
return result * 4
}
func performHeavyTask(iteration: Int) {
let pi = calculatePi(iterations: 100_000_000)
print(iteration, .pi - pi)
}
func testSerial() {
measure {
for i in 0..<10 {
self.performHeavyTask(iteration: i)
}
}
}
func testConcurrent() {
measure {
DispatchQueue.concurrentPerform(iterations: 10) { i in
self.performHeavyTask(iteration: i)
}
}
}
}
On my MacBook Pro 2018 with 2.9 GHz Intel Core i9, with a release build the concurrent test took, on average, 0.247 seconds, whereas the serial test took roughly four times as long, 1.030 seconds.

Are Swift4 variables atomic?

I was wondering if Swift 4 variables are atomic or not. So I did the following test.
The following is my test code.
class Test {
var count = 0
let lock = NSLock()
func testA() {
count = 0
let queueA = DispatchQueue(label: "Q1")
let queueB = DispatchQueue(label: "Q2")
let queueC = DispatchQueue(label: "Q3")
queueA.async {
for _ in 1...1000 {
self.increase()
}
}
queueB.async {
for _ in 1...1000 {
self.increase()
}
}
queueC.async {
for _ in 1...1000 {
self.increase()
}
}
}
///The increase() method:
func increase() {
// lock.lock()
self.count += 1
print(count)
// lock.unlock()
}
}
The output is as following with lock.lock() and lock.unlock() commented.
3
3
3
4
5
...
2999
3000
The output is as following with lock.lock() and lock.unlock uncommented.
1
2
3
4
5
...
2999
3000
My Problem
If the count variable is nonatomic, the queueA, queueB and the queueC should asynchronous call the increase(), which is resulted in randomly access and print count.
So, in my mind, there is a moment, for example, queueA and queueB got count equal to like 15, and both of them increase count by 1 (count += 1), so the count should be 16 even though there are two increasements executed.
But the three queues above just randomly start to count at the first beginning, then everything goes right as supposed to.
To conclude, my question is why count is printed orderly?
Update:
The problem is solved, if you want to do the experiment as what I did, do the following changes.
1.Change the increase() to the below, you will get reasonable output.
func increase() {
lock.lock()
self.count += 1
array.append(self.count)
lock.unlock()
}
2.The output method:
#IBAction func tapped(_ sender: Any) {
let testObjc = Test()
testObj.testA()
DispatchQueue.main.asyncAfter(deadline: DispatchTime.now()+3) {
print(self.testObj.array)
}
}
Output without NSLock:
Output with NSLock:
[1,2,3,...,2999,3000]
No, Swift properties are not atomic by default, and yes, it's likely you'll run into multi-threading issues, where multiple threads use an outdated value of that property, a property which just got updated.
But before we get to the chase, let's see what an atomic property is.
An atomic property is one that has an atomic setter - i.e. while the setter does it's job other threads that want to access (get or set) the property are blocked.
Now in your code we are not talking about an atomic property, as the += operation is actually split into at least three operations:
get the current value, store it in a CPU register
increment that CPU register
store the incremented value into the property
And even if the setter would be atomic, we can end up in situation where two threads "simultaneously" reach #1 and try to operate on the same value.
So the question here should be: is increase() an atomic operation?
Now back to the actual code, it's the print call that "rescues" you. An increment-and-store operation takes a very short amount of time, while printing takes much longer. This is why you seem do not run into race conditions, as the window where multiple threads can use an outdated value is quite small.
Try the following: uncomment the print call also, and print the count value after an amount time larger enough for all background threads to finish (2 seconds should be enough for 1000 iterations):
let t = Test()
t.testA()
DispatchQueue.main.asyncAfter(deadline: .now() + 2.0) {
// you're likely to get different results each run
print(t.count)
}
RunLoop.current.run()
You'll see now that the locked version gives consistent results, while the non-locked one doesn't.

Swift global variables thread safe

I know, global variables are "not sexy", but I have few in my current project. I played around with Xcode's Thread Sanitizer and found a data race on them. So I tried to make them thread safe.
As I also want a single point of management for this variables. I tried to do the GCD stuff in getter and setter of the variables.
Finally I found a solution that worked, was accepted by the compiler and the Thread Sanitizer was happy .... BUT... this solution looks quite ugly (see below) and is very slow (did a performance test and it was VERY slow).
Yes, I know, if I use classes for this it might be more "swifty", but there must be an easy solution for a thread safe global variable.
So would you be so kind and give hints and suggestions to optimize this attempt? Anyt hint / idea/ suggestion / comment is welcomed!
// I used a "computed variable", to overcome the compiler errors,
// we need a helper variable to store the actual value.
var globalVariable_Value : Int = 0
// this is the global "variable" we worked with
var globalVariable : Int {
// the setter
set (newValue) {
globalDataQueue.async(flags: .barrier) {
globalVariable_Value = newValue
}
}
// the getter
get {
// we need a helper variable to store the result.
// inside a void closure you are not allow to "return"
var result : Int = 0
globalDataQueue.sync{
result = globalVariable_Value
}
return result
}
}
// usage
globalVariable = 1
print ("globalVariable == \(globalVariable)")
globalVariable += 1
print ("globalVariable == \(globalVariable)")
// output
// globalVariable == 1
// globalVariable == 2
OOPer asked me to redo the performance tests as found the result strange.
Well, he was right. I did write a simple app (Performance Test App on GitHub) and attached some screenshots.
I run that tests on an iPhone SE with latest IOS. The app was started on the device, not in Xcode. compiler settings were "debug" for all shown test results. I did also test with "full optimizations" (smallest fastest [-Os]), but the results were very similar. I think in that simple tests is not much to optimize.
The test app simply runs the tests described in the answer above. To make it a little bit more realistic, it is doing each test on three parallel async dispatched queues with the qos classes .userInteractive, .default and .background.
There might be better ways to test such things. But for the purpose of this question, I think it's good enough.
I'm happy if anybody would reassess the code and maybe find better test algorithms ... we all would learn from it. I stop my work on this now.
The results are quite strange in my eyes. All three different approaches gives roughly the same performance. On each run there was another "hero", so I assume it is just influenced by other background tasks etc. So even Itai Ferber "nice" solution has in practice no benefit. It's "just" a more elegant code.
And yes, the thread save solution is WAY slower than the not queued solution.
This is the main learning: Yes, it's possible to make global variables thread safe, but there is a significant performance issue with it.
EDIT: I leave this first answer in to keep the history, but a hint of OOPer has lead to a total different view (see next answer).
First of all: I'm quite impressed how fast and well educated the answers flow in (we are on a weekend!)
So the suggestion of Itai Ferber was very good one, and as he asked, I did some performance tests, just to give him something in return ;-)
I run the test with the attached code in a playground. And as you see this is by far not a well designed performance test, it is just a simple test to get a gist of the performance impact. I did several iterations (see table below).
Again: I did it in a Playground, so the absolute times will be much better in a "real" app, but the differences between the tests will be very similar.
Key findings:
interactions shows linear behavior (as expected)
"My" solution (test1) is about 15 times slower than an "un-queued" global variable (test0)
I did a test were I used an additional global variable as the helper variable (test2), this is slightly faster, but not a real break through
The suggested solution from Itai Ferber (test3) is about 6 to 7 times slower that the pure global variable (test0)... so it is twice as fast as "my" solution
So alternative 3 does not only look better, as it doesn't need the overhead for the helper variable it is also faster.
// the queue to synchronze data access, it's a concurrent one
fileprivate let globalDataQueue = DispatchQueue(
label: "com.ACME.globalDataQueue",
attributes: .concurrent)
// ------------------------------------------------------------------------------------------------
// Base Version: Just a global variable
// this is the global "variable" we worked with
var globalVariable : Int = 0
// ------------------------------------------------------------------------------------------------
// Alternative 1: with concurrent queue, helper variable insider getter
// As I used a calculated variable, to overcome the compiler errors, we need a helper variable
// to store the actual value.
var globalVariable1_Value : Int = 0
// this is the global "variable" we worked with
var globalVariable1 : Int {
set (newValue) {
globalDataQueue.async(flags: .barrier) {
globalVariable1_Value = newValue
}
}
get {
// we need a helper variable to store the result.
// inside a void closure you are not allow to "return"
var globalVariable1_Helper : Int = 0
globalDataQueue.sync{
globalVariable1_Helper = globalVariable1_Value
}
return globalVariable1_Helper
}
}
// ------------------------------------------------------------------------------------------------
// Alternative 2: with concurrent queue, helper variable as additional global variable
// As I used a calculated variable, to overcome the compiler errors, we need a helper variable
// to store the actual value.
var globalVariable2_Value : Int = 0
var globalVariable2_Helper : Int = 0
// this is the global "variable" we worked with
var globalVariable2 : Int {
// the setter
set (newValue) {
globalDataQueue.async(flags: .barrier) {
globalVariable2_Value = newValue
}
}
// the getter
get {
globalDataQueue.sync{
globalVariable2_Helper = globalVariable2_Value
}
return globalVariable2_Helper
}
}
// ------------------------------------------------------------------------------------------------
// Alternative 3: with concurrent queue, no helper variable as Itai Ferber suggested
// "compact" design
var globalVariable3_Value : Int = 0
var globalVariable3 : Int {
set (newValue) {
globalDataQueue.async(flags: .barrier) { globalVariable3_Value = newValue }
}
get {
return globalDataQueue.sync { globalVariable3_Value }
}
}
// ------------------------------------------------------------------------------------------------
// -- Testing
// variable for read test
var testVar = 0
let numberOfInterations = 2
// Test 0
print ("\nStart test0: simple global variable, not thread safe")
let startTime = CFAbsoluteTimeGetCurrent()
for _ in 0 ..< numberOfInterations {
testVar = globalVariable
globalVariable += 1
}
let endTime = CFAbsoluteTimeGetCurrent()
let timeDiff = endTime - startTime
print("globalVariable == \(globalVariable), test0 time needed \(timeDiff) seconds")
// Test 1
testVar = 0
print ("\nStart test1: concurrent queue, helper variable inside getter")
let startTime1 = CFAbsoluteTimeGetCurrent()
for _ in 0 ..< numberOfInterations {
testVar = globalVariable1
globalVariable1 += 1
}
let endTime1 = CFAbsoluteTimeGetCurrent()
let timeDiff1 = endTime1 - startTime1
print("globalVariable == \(globalVariable1), test1 time needed \(timeDiff1) seconds")
// Test 2
testVar = 0
print ("\nStart test2: with concurrent queue, helper variable as an additional global variable")
let startTime2 = CFAbsoluteTimeGetCurrent()
for _ in 0 ..< numberOfInterations {
testVar = globalVariable2
globalVariable2 += 1
}
let endTime2 = CFAbsoluteTimeGetCurrent()
let timeDiff2 = endTime2 - startTime2
print("globalVariable == \(globalVariable2), test2 time needed \(timeDiff2) seconds")
// Test 3
testVar = 0
print ("\nStart test3: with concurrent queue, no helper variable as Itai Ferber suggested")
let startTime3 = CFAbsoluteTimeGetCurrent()
for _ in 0 ..< numberOfInterations {
testVar = globalVariable3
globalVariable3 += 1
}
let endTime3 = CFAbsoluteTimeGetCurrent()
let timeDiff3 = endTime3 - startTime3
print("globalVariable == \(globalVariable3), test3 time needed \(timeDiff3) seconds")

Waiting for asynchronous calls in a swift script

Im writing a swift script to be run in terminal that dispatches to the background thread a couple of operations. Without any extra effort, after all my dispatching is done, the code reaches the end of the file and quits, killing my background operations as well. What is the best way to keep the swift script alive until my background operations are finished?
The best I have come up with is the following, but I do not believe this is the best way, or even correct.
var semaphores = [dispatch_semaphore_t]()
while x {
var semaphore = dispatch_semaphore_create(0)
semaphores.append(semaphore)
dispatch_background {
//do lengthy operation
dispatch_semaphore_signal(semaphore)
}
}
for semaphore in semaphores {
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
}
In addition to using dispatch_groups, you can also do the following:
yourAsyncTask(completion: {
exit(0)
})
RunLoop.main.run()
Some resources:
RunLoop docs
Example from the swift-sh project
Exit code meanings
Thanks to Aaron Brager, who linked to
Multiple workers in Swift Command Line Tool ,
which is what I used to find my answer, using dispatch_groups to solve the problem.
How about something like this:
func runThingsInTheBackground() {
var semaphores = [dispatch_semaphore_t]()
for delay in [2, 3, 10, 7] {
var semaphore = dispatch_semaphore_create(0)
semaphores.append(semaphore)
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
sleep(UInt32(delay))
println("Task took \(delay) seconds")
dispatch_semaphore_signal(semaphore)
}
}
for semaphore in semaphores {
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
}
}
This is very similar to what you have. My work 'queue' is an array of seconds to sleep, so that you can see that things are happening int he background.
Do note that this just runs all the tasks in the background. If you want to limit the number of active tasks to for example the number of CPU cores then you have to do a little more work.
Not sure if that is what you were looking for, let me know.