DispatchQueue: why does serial complete faster than concurrent? - swift

I have a unit test setup to prove that concurrently performing multiple heavy tasks is faster than serial.
Now... before everyone in here loses their minds over the fact that above statement is not always correct because multithreading comes with many uncertainties, let me exlain.
I know from reading the apple documentation that you can not guarantee you get multiple threads when asking for them. The OS (iOS) will assign threads however it sees fit. If the device only has one core for example, it will assign one core and serial will be slightly faster due to initialisation code of concurrent operation taking some extra time whilst not delivering a performance improvement because the device has only one core.
However: This difference should only be slight. But in my POC setup the difference is massive. In my POC, concurrent is slower by about 1/3 of the time.
If serial completes in 6 seconds, concurrent will complete in 9 seconds.
This trend continues even with heavier loads. if serial completes in 125 seconds, concurrent will compete in 215 seconds. This also happens not just once but solid every time.
I wonder if I made a mistake in creating this POC, and if so, how should I prove that concurrently performing multiple heavy tasks is indeed faster than serial?
My POC in swift unit tests:
func performHeavyTask(_ completion: (() -> Void)?) {
var counter = 0
while counter < 50000 {
print(counter)
counter = counter.advanced(by: 1)
}
completion?()
}
// MARK: - Serial
func testSerial () {
let start = DispatchTime.now()
let _ = DispatchQueue.global(qos: .userInitiated)
let mainDPG = DispatchGroup()
mainDPG.enter()
DispatchQueue.global(qos: .userInitiated).async {[weak self] in
guard let self = self else { return }
for _ in 0...10 {
self.performHeavyTask(nil)
}
mainDPG.leave()
}
mainDPG.wait()
let end = DispatchTime.now()
let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
print("NanoTime: \(nanoTime / 1_000_000_000)")
}
// MARK: - Concurrent
func testConcurrent() {
let start = DispatchTime.now()
let _ = DispatchQueue.global(qos: .userInitiated)
let mainDPG = DispatchGroup()
mainDPG.enter()
DispatchQueue.global(qos: .userInitiated).async {
let dispatchGroup = DispatchGroup()
let _ = DispatchQueue.global(qos: .userInitiated)
DispatchQueue.concurrentPerform(iterations: 10) { index in
dispatchGroup.enter()
self.performHeavyTask({
dispatchGroup.leave()
})
}
dispatchGroup.wait()
mainDPG.leave()
}
mainDPG.wait()
let end = DispatchTime.now()
let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds // <<<<< Difference in nano seconds (UInt64)
print("NanoTime: \(nanoTime / 1_000_000_000)")
}
Details:
OS: macOS High Sierra
Model Name: MacBook Pro
Model Identifier: MacBookPro11,4
Processor Name: Intel Core i7
Processor Speed: 2,2 GHz
Number of Processors: 1
Total Number of Cores: 4
Both tests were done on iPhone XS Max simulator. Both tests were done straight after a reboot of the entire mac was done (to avoid the mac being busy with applications other than running this unit test, blurring results)
Also, both unit tests are wrapped in an async DispatcherWorkItem because the testcase is for the main (UI) queue not to be blocked, preventing the serial testcase to have an advantage on that part as it consumes the main queue instead of a background queue as the concurrent testcase does.
I'll also accept an answer that shows a POC reliably testing this. It does not have to show concurrent is faster than serial all the time (read above explanation as to why not). But at least some time

There are two issues:
I’d avoid doing print inside the loop. That’s synchronized and you’re likely to experience greater performance degradation in concurrent implementation. That’s not the whole story here, but it doesn’t help.
Even after removing the print from within the loop, 50,000 increments of the counter is simply not enough work to see the benefit of concurrentPerform. As Improving on Loop Code says:
... And although this [concurrentPerform] can be a good way to improve performance in loop-based code, you must still use this technique discerningly. Although dispatch queues have very low overhead, there are still costs to scheduling each loop iteration on a thread. Therefore, you should make sure your loop code does enough work to warrant the costs. Exactly how much work you need to do is something you have to measure using the performance tools.
On debug build, I needed to increase number of iterations to values closer to 5,000,000 before this overhead was overcome. And on release build, even that wasn’t sufficient. A spinning loop and incrementing a counter is just too quick to offer meaningful analysis of concurrent behavior.
So, in my example below, I replaced this spinning loop with a more computationally intensive calculation (calculating π using a historic, but not terribly efficient, algorithm).
As an aside:
Rather than measuring the performance yourself, if you do this within a XCTestCase unit test, you can use measure to benchmark performance. This repeats the benchmarking multiple times, captures elapsed time, averages the results, etc. Just make sure to edit your scheme so the test action uses an optimized “release” build rather than a “debug” build.
There’s no point in dispatching this to a global queue if you’re going to use dispatch group to make the calling thread wait for it to complete.
You don’t need to use dispatch groups to wait for concurrentPerform to finish, either. It runs synchronously.
As the concurrentPerform documentation says:
The dispatch queue executes the submitted block the specified number of times and waits for all iterations to complete before returning.
It’s not really material, but it’s worth noting that your for _ in 0...10 { ... } is doing 11 iterations, not 10. You obviously meant to use ..<.
Thus, here is an example, putting it in a unit test, but replacing the “heavy” calculation with something more computationally intensive:
class MyAppTests: XCTestCase {
// calculate pi using Gregory-Leibniz series
func calculatePi(iterations: Int) -> Double {
var result = 0.0
var sign = 1.0
for i in 0 ..< iterations {
result += sign / Double(i * 2 + 1)
sign *= -1
}
return result * 4
}
func performHeavyTask(iteration: Int) {
let pi = calculatePi(iterations: 100_000_000)
print(iteration, .pi - pi)
}
func testSerial() {
measure {
for i in 0..<10 {
self.performHeavyTask(iteration: i)
}
}
}
func testConcurrent() {
measure {
DispatchQueue.concurrentPerform(iterations: 10) { i in
self.performHeavyTask(iteration: i)
}
}
}
}
On my MacBook Pro 2018 with 2.9 GHz Intel Core i9, with a release build the concurrent test took, on average, 0.247 seconds, whereas the serial test took roughly four times as long, 1.030 seconds.

Related

Why does DispatchQueue.sync cause Data race?

Based on my printed output in the console window, the work in que2 was only executed after the que1 fully finished its work, so my question is why did I get the Data race warning even though the first block of work in que1 was completely synchronous?
Data race in closure #2 () -> () in BlankSwift at BlankSwift.porsche : BlankSwift.Car
struct Car {
var name: String
}
let que1 = DispatchQueue(label: "que1", qos: .background)
let que2 = DispatchQueue(label: "que2", qos: .userInteractive)
var porsche = Car(name: "Porsche")
for i in 0...100 {
que1.sync {
porsche.name = "porsche1"
print(porsche.name)
porsche.name = "Porsche11"
print(porsche.name)
if i == 100 { print("returned ")}
}
que2.async {
porsche.name = "porsche2"
print(porsche.name)
porsche.name = "Porsche22"
print(porsche.name)
}
}
While que1.sync is indeed called synchronously, que2.async is asynchronous on a different queue, so it schedules its closure and immediately returns, at which point you go to the next iteration of the loop.
There is some latency before the que2 closure begins executing. So for example, the closure for que2.async that was scheduled for iteration 0, is likely to start executing while que1.sync is executing for some later iteration, let's say iteration 10.
Not only that, que2 may well have multiple tasks queued up before the first one begins. It's a serial queue, because you didn't specify the .concurrent attribute, so you don't have to worry about que2 tasks racing on another que2 closure access of porsche.name , but they definitely can race on que1 closure accesses.
As for output ordering, ultimately the output will go to FileHandle.standardOutput to which the OS has a buffer attached, and you don't know what kind of synchronization scheme the OS uses to order writes to that buffer. It may well use it's own call to DispatchQueue.async to ensure that I/O is done in a sensible way, much the way UI updates on AppKit/UIKit have to be done on the main thread.

Best way to write large text file incrementally in Swift

I'm writing a fairly large text file (it's actually more like ascii-encoded data), and it's... very slow. And uses a lot of memory.
Here's a minimalistic version of the code I'm using to test how to write files more quickly. writeFileIncrementally writes one line at a time in a for loop, while writeFileFromBigData creates a large string and then dumps it to disk. I was fully expecting writeFileFromBigData to be faster, but it's 20 times faster! That's a bit more than I expected. For size=10_000_000, it takes 20-25 seconds to write it incrementally and 1-1.5 seconds to write it one go. Plus, the incremental version actually allocates more and more memory as it goes. By the end of it, it's well into the GiB range. I don't understand what's going on here.
func writeFileIncrementally(toUrl url: URL, size: Int) {
// ensure file exists and is empty
try? "".write(to: url, atomically: true, encoding: .ascii)
guard let handle = try? FileHandle(forWritingTo: url) else {return}
defer {
handle.closeFile()
}
for i in 0..<size {
let s = "\(i)\n"
handle.write(s.data(using: .ascii)!)
}
}
func writeFileFromBigData(toUrl url: URL, size: Int) {
let s = (0..<size).map{String($0)}.joined(separator: "\n")
try? s.write(to: url, atomically: true, encoding: .ascii)
}
Compare that to the same thing in Python. The create-string-then-write-it is faster in Python as well. That's reasonable, but the difference in Python is it takes about 2.7 seconds to write it incrementally (about 98% user time) and about 1 second to write it in one go (including creating the string). Additionally, the incremental version has constant memory usage. It does not go up as the file is being written.
def writeFileIncrementally(path, size):
with open(path, "w+") as f:
for i in range(size):
f.write(f"{i}\n")
def writeFileFromBigData(path, size):
with open(path, "w+") as f:
f.write("\n".join(str(i) for i in range(size)))
So my question is twofold:
Why is my writeFileIncrementally function so slow and why does it use so much memory? I was hoping to be able to write incrementally to reduce memory usage.
Is there some better approach for incrementally writing a large text file in Swift?
For memory, see Duncan C's answer. You need an autoreleasepool. But for speed, you have a small problem and a large problem.
The small problem is this line:
handle.write(s.data(using: .ascii)!)
Rewriting that will save about 40% of your time (from 27s to 17s in my tests):
handle.write(Data(s.utf8))
Strings are generally stored internally in UTF8. While ASCII is a perfect subset of that, your code requires checking for anything that isn't ASCII. Using .utf8 can often just grab the internal buffer directly. It also avoids creating and unwrapping an Optional.
But 17s is still a lot more than 1-2s. That's due to your big problem.
Every call to write has to get the data all the way to the OS's file buffers. Not all the way to the disk, but still, it's an expensive operation. Unless the data is precious, you generally want to chunk it into larger blocks (4k is very common). If you do this, the write time goes down to 1.5s:
let bufferSize = 4*1024
var buffer = Data(capacity: bufferSize)
for i in 0..<size {
autoreleasepool {
let s = "\(i)\n"
buffer.append(contentsOf: s.utf8)
if buffer.count >= bufferSize {
handle.write(buffer)
buffer.removeAll(keepingCapacity: true)
}
}
}
// Write the final buffer
handle.write(buffer)
This is "pretty close" to the the "big data" function's 1.1s on my system. There's still a lot of memory allocation and cleanup going on. And in my experience, at least recently, [UInt8] is much faster than Data. I'm not sure that was always true, but all my most recent tests on Mac go that way. So, writing with the newer write(contentsOf:) interface is:
let bufferSize = 4*1024
var buffer: [UInt8] = []
buffer.reserveCapacity(bufferSize)
for i in 0..<size {
autoreleasepool {
let s = "\(i)\n"
buffer.append(contentsOf: s.utf8)
if buffer.count >= bufferSize {
try? handle.write(contentsOf: buffer)
buffer.removeAll(keepingCapacity: true)
}
}
}
// Write the final buffer
try? handle.write(contentsOf: buffer)
And that's faster than the big data function, because it doesn't have to make a Data. (830ms on my machine)
But wait, it gets better. This code doesn't need the autorelease pool, and if you remove that, I can write this file in 730ms.
let bufferSize = 4*1024
var buffer: [UInt8] = []
buffer.reserveCapacity(bufferSize)
for i in 0..<size {
let s = "\(i)\n"
buffer.append(contentsOf: s.utf8)
if buffer.count >= bufferSize {
try? handle.write(contentsOf: buffer)
buffer.removeAll(keepingCapacity: true)
}
}
// Write the final buffer
try? handle.write(contentsOf: buffer)
But what about Python? Why doesn't it need buffers to be fast? Because it gives you buffers by default. Your open call returns a BufferedWriter with an 8k buffer that works more or less like the above code. You'd need to write in binary mode and also pass buffering=0 to turn it off. See the docs on open for the details.
I'm not sure why the incremental writing version is so slow.
If you're worried about memory use, though, you could make your memory footprint much smaller by wrapping your inner loop with a call to autoreleasepool():
for i in 0..<size {
autoreleasepool {
let s = "\(i)\n"
handle.write(s.data(using: .ascii)!)
if i.isMultiple(of: 100000) {
print(i)
}
}
}
(Internally, Swift's ARC memory management sometimes allocates temporary storage on the heap as "autoreleased", which means it sticks around in memory until the current call chain returns and the app revisits the event loop. If you have a processing loop that allocates a whole bunch of local variables they can accumulate on the heap until you finish and return. It's really only a problem if you push up against the memory limits of the device however.)
Edit:
I think this might be a case of premature optimization however. It looks to me like the max memory consumption for the "write it all at once" with 10,000,000 items is about 150 mb, which is not a problem for a device that's able to run current iOS versions. Just use the "write all at once" version and be done with it. If you need to write billions of lines at once, then write hybrid code that breaks it into chunks of 10 million at a time and appends each chunk to the file. (with the inner loop wrapped in a call to autoreleasepool(), as shown above.

Split up Task into multiple concurrent subtasks

I am trying to do some calculations on a large number of objects. The objects are saved in an array and the results of the operation should be saved in a new array. To speed up the processing, I‘m trying to break up the task into multiple subtasks which can run concurrently on different threads. The simplified example code below replaces the actual operation with two seconds of wait.
I have tried multiple ways of solving this issue, using both DispatchQueues and Tasks.
Using DispatchQueue
The basic setup I used is the following:
import Foundation
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
let myQueue = DispatchQueue(label: "Calculator", attributes: .concurrent)
var allPartialResults = [Set<String>]()
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
allPartialResults.append(Set<String>())
myQueue.async {
allPartialResults[i] = self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
myQueue.sync(flags: .barrier) {
for result in allPartialResults {
self.calculatedData.formUnion(result)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) -> Set<String> {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
As expected, the Console Log is the following (with all three "ended" appearing at once, two seconds after all three "began" appeared at once):
began
began
began
ended
ended
ended
When measuring performance using os_signpost (and using real data and calculations), this approach reduces the time needed for the entire doCalculation() function to run from 40ms to around 14ms.
Note that to avoid data races when appending the results to the final calculatedData Set, I created an array of partial Data sets of which every DispatchQueue only accesses one index (which is not a solution I like and the main reason why I am not satisfied with this approach). What I would have liked to do is to call DispatchQueue.main from within myQueue and add the new data to the calculatedData Set on the main thread, however calling DispatchQueue.main.sync causes a deadlock and using the async version leads to the barrier flag not working as intended.
Using Tasks
In a second attempt, I tried using Tasks to run code concurrently. As I understand it, there are two options for running code concurrently with Tasks. async let and withTaskGroup. For the purpose of retrieving a variable quantity of partial results form a variable amount of concurrent tasks, I figured using withTaskGroup was the best option for me.
I modified the code to look like this:
class Main {
let originalData = ["a", "b", "c"]
var calculatedData = Set<String>()
func doCalculation() async {
//calculate length of array slices.
let totalLength = originalData.count
let sliceLength = Int(totalLength / 3)
var start = 0
var end = 0
await withTaskGroup(of: Set<String>.self) { group in
for i in 0..<3 {
if i != 2 {
start = sliceLength * i
end = start + sliceLength - 1
} else {
start = totalLength - sliceLength * (i - 1)
end = totalLength - 1
}
group.addTask {
return await self.doPartialCalculation(data: Array(self.originalData[start...end]))
}
}
for await newSet in group {
calculatedData.formUnion(newSet)
}
}
//do further calculations with the data
}
func doPartialCalculation(data: [String]) async -> Set<String> {
print("began")
try? await Task.sleep(nanoseconds: UInt64(1e9))
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
}
However, the Console Log prints the following (with every "ended" coming 2 seconds after the preceding "before"):
began
ended
began
ended
began
ended
Measuring performance using os_signpost revealed that the operation takes 40ms to complete. Therefore it is not running concurrently.
With that being said, what is the best course of action for this problem?
Using DispatchQueue, how do you call the Main Queue to avoid data races from within a queue, while at the same time preserving a barrier flag later on in the code?
Using Task, how do can you actually make them run concurrently?
EDIT
Running the code on a real device instead of the simulator and changing the sleep function inside the Task from sleep() to Task.sleep(), I was able to achieve concurrent behavior in that the Console prints the expected log. However, the operation time for the task remains upwards of 40-50ms and is highly variable, sometimes reaching 200ms or more. This problem remains after adding the .userInitiated property to the Task.
Why does it take so much longer to run the same operation concurrently using Task compared to using DispatchQueue? Am I missing something?
A few observations:
One possible performance difference is that the simulator artificially constrains the “cooperative thread pool” used by async-await. See Maximum number of threads with async-await task groups. This is one cause of a lack of full concurrency (on the simulator).
In the async-await test, another factor that can affect concurrency is an actor. If an actor is enforcing serial execution, then consider declaring doPartialCalculation as nonisolated, so that it allows concurrent execution. Failure to do so can prevent any concurrent execution (with your sleep scenario, for example).
The fact that you saw a significant performance difference when you went from sleep to Task.sleep makes me wonder if might have done this within an actor. Actors are “reentrant” and Task.sleep suspends execution and lets the actor to switch to another task. So it allows concurrency for a series of async methods.
But Task.sleep is not analogous to some computationally intensive task that will tie up the thread. But by declaring the function as nonisolated, that will achieve concurrent execution for computationally intensive processes. That can achieve performance results that are nearly equivalent to what you achieved with a GCD implementation.
That having being said, you might still find that async-await is a tiny bit slower than pure GCD implementations. Then again, Swift concurrency offers more native protections and compile-time warnings to ensure thread-safety.
E.g., here are 100 compute-heavy tasks in both GCD and async-await, performed twice for each:
So, you simply have to ask yourself whether the benefits of async-await warrant the modest performance impact or not.
A few unrelated asides on the GCD implementation:
It should be noted that your GCD example is not thread-safe and so the comparison of your two code snippets is not entirely fair. You should make the GCD implementation thread-safe. (Perhaps consider temporarily testing with TSAN. See “Detect Data Races Among Your App’s Threads” section of Diagnosing Memory, Thread, and Crash Issues Early.) You should perform doPartialCalculation in parallel, but you must synchronize the update of allPartialResults (or any shared resource). You can use GCD serial queue for this. Or since you seem to be so concerned about performance, perhaps a NSLock or os_unfair_lock (though care must be taken with the latter). See the GCD example at the end of this answer.
If your dispatched blocks are taking ~50 msec, that simply might not be enough work to justify the overhead of concurrency. You may even find that a simple, serial, rendition is faster!
Often, to maximize the amount of work done per thread, we would “stride” through our index (which is what you appear to be doing with your “slice” logic). But if, even after striding, the time per concurrent loop is still measured in milliseconds, then it may turn out that concurrency is unwarranted altogether. Some tasks are so trivial that they simply will not benefit from concurrent execution.
In your GCD example, you are dispatching to a concurrent queue, which if you have too many iterations, can lead to “thread explosion”, exhausting a very limited worker thread pool. You are only doing three iterations, so that’s not a problem now, but if the number of iterations grows, you would want to abandon that pattern, and adopt concurrentPerform (as seen here). It’s a great way to make full use of the hardware capabilities while avoiding the exhausting of the worker thread pool.
As an aside, I would be wary of using any of the sleep methods as a proxy for a time consuming task. You actually want to keep the CPU busy. I personally use an inefficient π calculation as my general proxy for “do something slow”. That is what I used above.
func performHeavyTask(iteration: Int) {
let id = OSSignpostID(log: poi)
os_signpost(.begin, log: poi, name: #function, signpostID: id, "%d", iteration)
let pi = calculatePi(iterations: 100_000_000)
os_signpost(.end, log: poi, name: #function, signpostID: id, "%f", pi)
}
// calculate pi using Gregory-Leibniz series
func calculatePi(iterations: Int) -> Double {
var result = 0.0
var sign = 1.0
for i in 0 ..< iterations {
result += sign / Double(i * 2 + 1)
sign *= -1
}
return result * 4
}
E.g. here is a GCD example which
uses concurrentPerform;
performs calculation in parallel but synchronizes array updates;
performs update of model on main thread;
uses Sequence<String> rather than [String] to eliminate expensive array creation:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.sync { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
DispatchQueue.main.async { // update on main thread
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
// note, rather than taking `[String]`, which requires us to create a new
// `Array` instance, let's change this to take `Sequence<String>` as
// input ... that way we can supply array slices directly
func doPartialCalculation<S>(with data: S) -> Set<String> where S: Sequence, S.Element == String {
print("began")
sleep(2)
let someResultSet: Set<String> = ["some result"]
print("ended")
return someResultSet
}
Or, alternatively, you could do the updates of the local var asynchronously and keep track of them with a DispatchGroup, performing the final update (or call to the completion handler) on the .main queue:
func doCalculation() {
DispatchQueue.global().async { [originalData] in // gives me the willies to see asynchronous routine accessing property, so I might capture it here in case it ever changes to mutable property; or, better, it should be parameter of `doCalculation`
let totalLength = originalData.count
let iterations = 3 // avoid brittle pattern of repeating this number (of values based upon it) repeatedly
let sliceLength = totalLength / iterations
let queue = DispatchQueue(label: "Calculator") // serial queue for synchronization
let group = DispatchGroup()
var allResults = Set<String>()
DispatchQueue.concurrentPerform(iterations: iterations) { i in
let start = i * sliceLength
let end = min(start + sliceLength, totalLength)
let result = self.doPartialCalculation(with: originalData[start..<end]) // do calculation in parallel
queue.async(group: group) { allResults.formUnion(result) } // synchronize update
}
// personally, I would not update a property from this method,
// but rather would use local var and supply the results in a completion
// handler parameter, and let caller update model as it sees fit.
//
// But if you are going to do this, synchronize the update somehow,
// e.g., do it on the main thread.
group.notify(queue: .main) {
self.calculatedData = allResults // or `self.calculatedData.formUnion(allResults)`, if that's what you really mean
}
}
}
You can benchmark this and see whether the asynchronous update has any material impact. It probably will not in this case, but the proof is in the pudding.
Your Task-based example looks like it should execute concurrently. I ran it and am able to get concurrent execution.
Probably the issue you're having is that Swift concurrency tries to limit Task concurrency to the number of available cores. And (I don't think this is well documented!) Swift playgrounds and the iOS simulators seem to execute in a single-core environment.
So if you run your code in a Swift playground, you'll get serial task execution. If you make a Mac app and run it in that, or on an iOS device, you should get parallel execution.
This WWDC talk from last year has a discussion of why it works that way: https://developer.apple.com/videos/play/wwdc2021/10254/?time=652
That's worth paying attention to. You'll of course be fine scheduling 3 blocks on a concurrent queue, but if your example is standing in for a real workload that might have hundreds or thousands, it's easy to cause thread explosion and create new, harder to understand performance issues.

Are Swift4 variables atomic?

I was wondering if Swift 4 variables are atomic or not. So I did the following test.
The following is my test code.
class Test {
var count = 0
let lock = NSLock()
func testA() {
count = 0
let queueA = DispatchQueue(label: "Q1")
let queueB = DispatchQueue(label: "Q2")
let queueC = DispatchQueue(label: "Q3")
queueA.async {
for _ in 1...1000 {
self.increase()
}
}
queueB.async {
for _ in 1...1000 {
self.increase()
}
}
queueC.async {
for _ in 1...1000 {
self.increase()
}
}
}
///The increase() method:
func increase() {
// lock.lock()
self.count += 1
print(count)
// lock.unlock()
}
}
The output is as following with lock.lock() and lock.unlock() commented.
3
3
3
4
5
...
2999
3000
The output is as following with lock.lock() and lock.unlock uncommented.
1
2
3
4
5
...
2999
3000
My Problem
If the count variable is nonatomic, the queueA, queueB and the queueC should asynchronous call the increase(), which is resulted in randomly access and print count.
So, in my mind, there is a moment, for example, queueA and queueB got count equal to like 15, and both of them increase count by 1 (count += 1), so the count should be 16 even though there are two increasements executed.
But the three queues above just randomly start to count at the first beginning, then everything goes right as supposed to.
To conclude, my question is why count is printed orderly?
Update:
The problem is solved, if you want to do the experiment as what I did, do the following changes.
1.Change the increase() to the below, you will get reasonable output.
func increase() {
lock.lock()
self.count += 1
array.append(self.count)
lock.unlock()
}
2.The output method:
#IBAction func tapped(_ sender: Any) {
let testObjc = Test()
testObj.testA()
DispatchQueue.main.asyncAfter(deadline: DispatchTime.now()+3) {
print(self.testObj.array)
}
}
Output without NSLock:
Output with NSLock:
[1,2,3,...,2999,3000]
No, Swift properties are not atomic by default, and yes, it's likely you'll run into multi-threading issues, where multiple threads use an outdated value of that property, a property which just got updated.
But before we get to the chase, let's see what an atomic property is.
An atomic property is one that has an atomic setter - i.e. while the setter does it's job other threads that want to access (get or set) the property are blocked.
Now in your code we are not talking about an atomic property, as the += operation is actually split into at least three operations:
get the current value, store it in a CPU register
increment that CPU register
store the incremented value into the property
And even if the setter would be atomic, we can end up in situation where two threads "simultaneously" reach #1 and try to operate on the same value.
So the question here should be: is increase() an atomic operation?
Now back to the actual code, it's the print call that "rescues" you. An increment-and-store operation takes a very short amount of time, while printing takes much longer. This is why you seem do not run into race conditions, as the window where multiple threads can use an outdated value is quite small.
Try the following: uncomment the print call also, and print the count value after an amount time larger enough for all background threads to finish (2 seconds should be enough for 1000 iterations):
let t = Test()
t.testA()
DispatchQueue.main.asyncAfter(deadline: .now() + 2.0) {
// you're likely to get different results each run
print(t.count)
}
RunLoop.current.run()
You'll see now that the locked version gives consistent results, while the non-locked one doesn't.

Waiting for asynchronous calls in a swift script

Im writing a swift script to be run in terminal that dispatches to the background thread a couple of operations. Without any extra effort, after all my dispatching is done, the code reaches the end of the file and quits, killing my background operations as well. What is the best way to keep the swift script alive until my background operations are finished?
The best I have come up with is the following, but I do not believe this is the best way, or even correct.
var semaphores = [dispatch_semaphore_t]()
while x {
var semaphore = dispatch_semaphore_create(0)
semaphores.append(semaphore)
dispatch_background {
//do lengthy operation
dispatch_semaphore_signal(semaphore)
}
}
for semaphore in semaphores {
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
}
In addition to using dispatch_groups, you can also do the following:
yourAsyncTask(completion: {
exit(0)
})
RunLoop.main.run()
Some resources:
RunLoop docs
Example from the swift-sh project
Exit code meanings
Thanks to Aaron Brager, who linked to
Multiple workers in Swift Command Line Tool ,
which is what I used to find my answer, using dispatch_groups to solve the problem.
How about something like this:
func runThingsInTheBackground() {
var semaphores = [dispatch_semaphore_t]()
for delay in [2, 3, 10, 7] {
var semaphore = dispatch_semaphore_create(0)
semaphores.append(semaphore)
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
sleep(UInt32(delay))
println("Task took \(delay) seconds")
dispatch_semaphore_signal(semaphore)
}
}
for semaphore in semaphores {
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
}
}
This is very similar to what you have. My work 'queue' is an array of seconds to sleep, so that you can see that things are happening int he background.
Do note that this just runs all the tasks in the background. If you want to limit the number of active tasks to for example the number of CPU cores then you have to do a little more work.
Not sure if that is what you were looking for, let me know.