How to get concurrency when using AsyncLines - swift

I'm trying to use AsyncLineSequence with Process to execute many instances of a shell script at the same time. The issue I'm seeing is that with my usage of AsyncLineSequence I'm not seeing the output of the Process invocations interweaved like I would expect. It feels like there is something fundamental I am misunderstanding as this seems like it should work to me.
Here's a reproduction in a playground
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
func run(label: String) throws {
let process = Process()
process.executableURL = URL(fileURLWithPath: "/usr/bin/yes")
let pipe = Pipe()
process.standardOutput = pipe
Task {
for try await _ in pipe.fileHandleForReading.bytes.lines {
print(label)
}
}
try process.run()
}
Task {
try run(label: "a")
}
Task {
try run(label: "b")
}
The above will print only a or b but never both. If I change to not use AsyncLineSequence like this
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
func run(label: String) throws {
let process = Process()
process.executableURL = URL(fileURLWithPath: "/usr/bin/yes")
let pipe = Pipe()
process.standardOutput = pipe
pipe.fileHandleForReading.readabilityHandler = { _ in
print(label)
}
try process.run()
}
Task {
try run(label: "a")
}
Task {
try run(label: "b")
}
The as and bs are both printed interleaved.
To add to my confusion if I use URLSession to get async lines by reading an arbitrary file it does interleave the print statements of a and b as I'd expect
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
Task {
for try await _ in try await URLSession.shared.bytes(from: URL(fileURLWithPath: "/usr/bin/yes")).0.lines {
print("a")
}
}
Task {
for try await _ in try await URLSession.shared.bytes(from: URL(fileURLWithPath: "/usr/bin/yes")).0.lines {
print("b")
}
}
If I replace URLSession for FileHandle in the above then I am back to no interleaving and all of one file is read followed by the next
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
Task {
for try await _ in try FileHandle(forReadingFrom: URL(fileURLWithPath: "/usr/bin/yes")).bytes.lines {
print("a")
}
}
Task {
for try await _ in try FileHandle(forReadingFrom: URL(fileURLWithPath: "/usr/bin/yes")).bytes.lines {
print("b")
}
}

When I did this (10 seconds rather than 2 seconds, and in an app rather than a Playground), I do see them jumping back and forth.
Admittedly, it was not one-for-one interleaving (it was lots of “a”s followed by lots of “b”s, and then the process repeats). But there is no reason it would interleave perfectly one-for-one between the two processes, because while lines emits an asynchronous sequence of lines, behind the scenes it is likely reading chunks of output from the pipe, not really consuming it line by line, which would be very inefficient. (And, IMHO, it’s interesting that the URLSession behavior is different, but not terribly surprising.) And you effectively have two processes racing, so there is no reason to expect a graceful, alternating, behavior between the two.
If you replace yes with a program that waits a little between lines of output (e.g., I had it wait for 0.01 seconds between each line of output), then you will see it interleave a bit more frequently. Or when I added an actor to keep track which process last emitted a line of output, that was enough to trigger an immediate back-and-forth processing of one line from each yes output alternatively.
You might also want to consider the implication of running these two loops with Task { ... }, as that will run each “operation asynchronously as part of a new top-level task on behalf of the current actor” [emphasis added]. You might consider detached tasks or separate actors (to reduce contention on the current actor handling both loops). In my tests, it did not change the results too dramatically, but your mileage may vary. Regardless, it is something to be aware of.

Related

Testing parallel execution with structured concurrency

I’m testing code that uses an actor, and I’d like to test that I’m properly handling concurrent access and reentrancy. One of my usual approaches would be to use DispatchQueue.concurrentPerform to fire off a bunch of requests from different threads, and ensure my values resolve as expected. However, since the actor uses structured concurrency, I’m not sure how to actually wait for the tasks to complete.
What I’d like to do is something like:
let iterationCount = 100
let allTasksComplete = expectation(description: "allTasksComplete")
allTasksComplete.expectedFulfillmentCount = iterationCount
DispatchQueue.concurrentPerform(iterations: iterationCount) { _ in
Task {
// Do some async work here, and assert
allTasksComplete.fulfill()
}
}
wait(for: [allTasksComplete], timeout: 1.0)
However the timeout for the allTasksComplete expectation expires every time, regardless of whether the iteration count is 1 or 100, and regardless of the length of the timeout. I’m assuming this has something to do with the fact that mixing structured and DispatchQueue-style concurrency is a no-no?
How can I properly test concurrent access — specifically how can I guarantee that the actor is accessed from different threads, and wait for the test to complete until all expectations are fulfilled?
A few observations:
When testing Swift concurrency, we no longer need to rely upon expectations. We can just mark our tests as async methods. See Asynchronous Tests and Expectations. Here is an async test adapted from that example:
func testDownloadWebDataWithConcurrency() async throws {
let url = try XCTUnwrap(URL(string: "https://apple.com"), "Expected valid URL.")
let (_, response) = try await URLSession.shared.data(from: url)
let httpResponse = try XCTUnwrap(response as? HTTPURLResponse, "Expected an HTTPURLResponse.")
XCTAssertEqual(httpResponse.statusCode, 200, "Expected a 200 OK response.")
}
FWIW, while we can now use async tests when testing Swift concurrency, we still can use expectations:
func testWithExpectation() {
let iterations = 100
let experiment = ExperimentActor()
let e = self.expectation(description: #function)
e.expectedFulfillmentCount = iterations
for i in 0 ..< iterations {
Task.detached {
let result = await experiment.reentrantCalculation(i)
let success = await experiment.isAcceptable(result)
XCTAssert(success, "Incorrect value")
e.fulfill()
}
}
wait(for: [e], timeout: 10)
}
You said:
However the timeout for the allTasksComplete expectation expires every time, regardless of whether the iteration count is 1 or 100, and regardless of the length of the timeout.
We cannot comment without seeing a reproducible example of the code replaced with the comment “Do some async work here, and assert”. We do not need to see your actual implementation, but rather construct the simplest possible example that manifests the behavior you describe. See How to create a Minimal, Reproducible Example.
I personally suspect that you have some other, unrelated deadlock. E.g., given that concurrentPerform blocks the thread from which you call it, maybe you are doing something that requires the blocked thread. Also, be careful with Task { ... }, which runs the task on the current actor, so if you are doing something slow and synchronous inside there, that could cause problems. We might use detached tasks, instead.
In short, we cannot diagnose the issue without a Minimal, Reproducible Example.
As a more general observation, one should be wary about mixing GCD (or semaphores or long-lived locks or whatever) with Swift concurrency, because the latter uses a cooperative thread pool, which relies upon assumptions about its threads being able to make forward progress. But if you have GCD API blocking threads, those assumptions may no longer be valid. It may not the source of the problem here, but I mention it as a cautionary note.
As an aside, concurrentPerform (which constrains the degree of parallelism) only makes sense if the work being executed runs synchronously. Using concurrentPerform to launch a series of asynchronous tasks will not constrain the concurrency at all. (The cooperative thread pool may, but concurrentPerform will not.)
So, for example, if we wanted to test a bunch of calculations in parallel, rather than concurrentPerform, we might use a TaskGroup:
func testWithStructuredConcurrency() async {
let iterations = 100
let experiment = ExperimentActor()
await withTaskGroup(of: Void.self) { group in
for i in 0 ..< iterations {
group.addTask {
let result = await experiment.reentrantCalculation(i)
let success = await experiment.isAcceptable(result)
XCTAssert(success, "Incorrect value")
}
}
}
let count = await experiment.count
XCTAssertEqual(count, iterations)
}
Now if you wanted to verify concurrent execution within an app, normally I would just profile the app (not unit tests) with Instruments, and either watch intervals in the “Points of Interest” tool or look at the new “Swift Tasks” tool described in WWDC 2022’s Visualize and optimize Swift concurrency video. E.g., here I have launched forty tasks and I can see that my device runs six at a time:
See Alternative to DTSendSignalFlag to identify key events in Instruments? for references about the “Points of Interest” tool.
If you really wanted to write a unit test to confirm concurrency, you could theoretically keep track of your own counters, e.g.,
final class MyAppTests: XCTestCase {
func testWithStructuredConcurrency() async {
let iterations = 100
let experiment = ExperimentActor()
await withTaskGroup(of: Void.self) { group in
for i in 0 ..< iterations {
group.addTask {
let result = await experiment.reentrantCalculation(i)
let success = await experiment.isAcceptable(result)
XCTAssert(success, "Incorrect value")
}
}
}
let count = await experiment.count
XCTAssertEqual(count, iterations, "Correct count")
let degreeOfConcurrency = await experiment.maxDegreeOfConcurrency
XCTAssertGreaterThan(degreeOfConcurrency, 1, "No concurrency")
}
}
Where:
actor ExperimentActor {
var degreeOfConcurrency = 0
var maxDegreeOfConcurrency = 0
var count = 0
/// Calculate pi with Leibniz series
///
/// Note: I am awaiting a detached task so that I can manifest actor reentrancy.
func reentrantCalculation(_ index: Int, decimalPlaces: Int = 8) async -> Double {
let task = Task.detached {
logger.log("starting \(index)") // I wouldn’t generally log in a unit test, but it’s a quick visual confirmation that I’m enjoying parallel execution
await self.increaseConcurrencyCount()
let threshold = pow(0.1, Double(decimalPlaces))
var isPositive = true
var denominator: Double = 1
var result: Double = 0
var increment: Double
repeat {
increment = 4 / denominator
if isPositive {
result += increment
} else {
result -= increment
}
isPositive.toggle()
denominator += 2
} while increment >= threshold
logger.log("finished \(index)")
await self.decreaseConcurrencyCount()
return result
}
count += 1
return await task.value
}
func increaseConcurrencyCount() {
degreeOfConcurrency += 1
if degreeOfConcurrency > maxDegreeOfConcurrency { maxDegreeOfConcurrency = degreeOfConcurrency}
}
func decreaseConcurrencyCount() {
degreeOfConcurrency -= 1
}
func isAcceptable(_ result: Double) -> Bool {
return abs(.pi - result) < 0.0001
}
}
Please note that if testing/running on simulator, the cooperative thread pool is somewhat constrained, not exhibiting the same degree of concurrency as you will see on an actual device.
Also note that if you are testing whether a particular test is exhibiting parallel execution, you might want to disable the parallel execution of tests, themselves, so that other tests do not tie up your cores, preventing any given particular test from enjoying parallel execution.

TaskGroup limit amount of memory usage for lots of tasks

I'm trying to build a chunked file uploading mechanism using modern Swift Concurrency.
There is a streamed file reader which I'm using to read files chunk by chunk of 1mb size.
It has two closures nextChunk: (DataChunk) -> Void and completion: () - Void. The first one gets called as many times as there is data read from InputStream of a chunk size.
In order to make this reader compliant to Swift Concurrency I made the extension and created AsyncStream
which seems to be the most suitable for such a case.
public extension StreamedFileReader {
func read() -> AsyncStream<DataChunk> {
AsyncStream { continuation in
self.read(nextChunk: { chunk in
continuation.yield(chunk)
}, completion: {
continuation.finish()
})
}
}
}
Using this AsyncStream I read some file iteratively and make network calls like this:
func process(_ url: URL) async {
// ...
do {
for await chunk in reader.read() {
let request = // ...
_ = try await service.upload(data: chunk.data, request: request)
}
} catch let error {
reader.cancelReading()
print(error)
}
}
The issue there is that there is no any limiting mechanism I'm aware of that won't allow to execute more than
N network calls. Thus when I'm trying to upload huge file (5Gb) memory consumption grows drastically.
Because of that the idea of streamed reading of file makes no sense as it'd be easier to read the entire file into the memory (it's a joke but looks like that).
In contrast, if I'm using a good old GCD everything works like a charm:
func process(_ url: URL) {
let semaphore = DispatchSemaphore(value: 5) // limit to no more than 5 requests at a given time
let uploadGroup = DispatchGroup()
let uploadQueue = DispatchQueue.global(qos: .userInitiated)
uploadQueue.async(group: uploadGroup) {
// ...
reader.read(nextChunk: { chunk in
let requset = // ...
uploadGroup.enter()
semaphore.wait()
service.upload(chunk: chunk, request: requset) {
uploadGroup.leave()
semaphore.signal()
}
}, completion: { _ in
print("read completed")
})
}
}
Well it is not exactly the same behavior as it uses a concurrent DispatchQueue when AsyncStream runs sequentially.
So I did a little research and found out that probably TaskGroup is what I need in this case. It allows to run async tasks in parallel etc.
I tried it this way:
func process(_ url: URL) async {
// ...
do {
let totalParts = try await withThrowingTaskGroup(of: Void.self) { [service] group -> Int in
var counter = 1
for await chunk in reader.read() {
let request = // ...
group.addTask {
_ = try await service.upload(data: chunk.data, request: request)
}
counter = chunk.index
}
return counter
}
} catch let error {
reader.cancelReading()
print(error)
}
}
In that case memory consumption is even more that in example with AsyncStream iterating!
I suspect that there should be some conditions on which I need to suspend group or task or something and
call group.addTask only when it is possible to really handle these tasks I'm going to add but I have no idea how to do it.
I found this Q/A
And tried to put try await group.next() for each 5th chunk but it didn't help me at all.
Is there any mechanism similar to DispatchGroup + DispatchSemaphore but for modern concurrency?
UPDATE:
In order to better demonstrate the difference between all 3 ways here are screenshots of memory report
AsyncStream iterating
AsyncStream + TaskGroup (using try await group.next() on each 5th chunk)
GCD DispatchQueue + DispatchGroup + DispatchSemaphore
The key problem is the use of the AsyncStream. Your AsyncStream is reading data and yielding chunks more quickly than it can be uploaded.
Consider this MCVE where I simulate a stream of 100 chunks, 1mb each:
import os.log
private let log = OSLog(subsystem: "Test", category: .pointsOfInterest)
struct Chunk {
let index: Int
let data: Data
}
actor FileMock {
let maxChunks = 100
let chunkSize = 1_000_000
var index = 0
func nextChunk() -> Chunk? {
guard index < maxChunks else { print("done"); return nil }
defer { index += 1 }
return Chunk(index: index, data: Data(repeating: UInt8(index & 0xff), count: chunkSize))
}
func chunks() -> AsyncStream<Chunk> {
AsyncStream { continuation in
index = 0
while let chunk = nextChunk() {
os_signpost(.event, log: log, name: "chunk")
continuation.yield(chunk)
}
continuation.finish()
}
}
}
And
func uploadAll() async throws {
try await withThrowingTaskGroup(of: Void.self) { group in
let chunks = await FileMock().chunks()
var index = 0
for await chunk in chunks {
index += 1
if index > 5 {
try await group.next()
}
group.addTask { [self] in
try await upload(chunk)
}
}
try await group.waitForAll()
}
}
func upload(_ chunk: Chunk) async throws {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "%d start", chunk.index)
try await Task.sleep(nanoseconds: 1 * NSEC_PER_SEC)
os_signpost(.end, log: log, name: #function, signpostID: id, "end")
}
When I do that, I see memory spike to 150mb as the AsyncStream rapidly yields all of the chunks upfront:
Note that all the Ⓢ signposts, showing when the Data objects are created, are clumped at the start of the process.
Note, the documentation warns us that the sequence might conceivably generate values faster than they can be consumed:
An arbitrary source of elements can produce elements faster than they are consumed by a caller iterating over them. Because of this, AsyncStream defines a buffering behavior, allowing the stream to buffer a specific number of oldest or newest elements. By default, the buffer limit is Int.max, which means the value is unbounded.
Unfortunately, the various buffering alternatives, .bufferingOldest and .bufferingNewest, will only discard values when the buffer is filled. In some AsyncStreams, that might be a viable solution (e.g., if you are tracking the user location, you might only care about the most recent location), but when uploading chunks of the file, you obviously cannot have it discard chunks when the buffer is exhausted.
So, rather than AsyncStream, just wrap your file reading with a custom AsyncSequence, which will not read the next chunk until it is actually needed, dramatically reducing peak memory usage, e.g.:
struct FileMock: AsyncSequence {
typealias Element = Chunk
struct AsyncIterator : AsyncIteratorProtocol {
let chunkSize = 1_000_000
let maxChunks = 100
var current = 0
mutating func next() async -> Chunk? {
os_signpost(.event, log: log, name: "chunk")
guard current < maxChunks else { return nil }
defer { current += 1 }
return Chunk(index: current, data: Data(repeating: UInt8(current & 0xff), count: chunkSize))
}
}
func makeAsyncIterator() -> AsyncIterator {
return AsyncIterator()
}
}
And
func uploadAll() async throws {
try await withThrowingTaskGroup(of: Void.self) { group in
var index = 0
for await chunk in FileMock() {
index += 1
if index > 5 {
try await group.next()
}
group.addTask { [self] in
try await upload(chunk)
}
}
try await group.waitForAll()
}
}
And that avoids loading all 100mb in memory at once. Note, the vertical scale on memory is different, but you can see that the peak usage is 100mb less than the above graph and the Ⓢ signposts, showing when data is read into memory, are now distributed throughout the graph rather than all at the start:
Now, obviously, I am only mocking the reading of a large file with Chunk/Data objects and mocking the upload with a Task.sleep, but it hopefully illustrates the basic idea.
Bottom line, do not use AsyncStream to read the file, but rather consider a custom AsyncSequence or other pattern that reads the file in as the chunks are needed.
A few other observations:
You said “tried to put try await group.next() for each 5th chunk”. Perhaps you can show us what you tried. But note that this answer didn’t say “each 5th chunk” but rather “every chunk after the 5th”. We cannot comment on what you tried unless you show us what you actually tried (or provide a MCVE). And as the above shows, using Instruments’ “Points of Interest” tool can show the actual concurrency.
By the way, when uploading large asset, consider using a file-based upload rather than Data. The file-based uploads are far more memory efficient. Regardless of the size of the asset, the memory used during a file-based asset will be measured in kb. You can even turn off chunking entirely, and a file-based upload will use very little memory regardless of the file size. URLSession file uploads have a minimal memory footprint. It is one of the reasons we do file-based uploads.
The other reason for file-based uploads is that, for iOS especially, one can marry the file-based upload with a background session. With a background session, the user can even leave the app to do something else, and the upload will continue to operate in the background. At that point, you can reassess whether you even need/want to do chunking at all.

How to suspend subsequent tasks until first finishes then share its response with tasks that waited?

I have an actor which throttles requests in a way where the first one will suspend subsequent requests until finished, then share its response with them so they don't have to make the same request.
Here's what I'm trying to do:
let cache = Cache()
let operation = OperationStatus()
func execute() async {
if await operation.isExecuting else {
await operation.waitUntilFinished()
} else {
await operation.set(isExecuting: true)
}
if let data = await cache.data {
return data
}
let request = myRequest()
let response = await myService.send(request)
await cache.set(data: response)
await operation.set(isExecuting: false)
}
actor Cache {
var data: myResponse?
func set(data: myResponse?) {
self.data = data
}
}
actor OperationStatus {
#Published var isExecuting = false
private var cancellable = Set<AnyCancellable>()
func set(isExecuting: Bool) {
self.isExecuting = isExecuting
}
func waitUntilFinished() async {
guard isExecuting else { return }
return await withCheckedContinuation { continuation in
$isExecuting
.first { !$0 } // Wait until execution toggled off
.sink { _ in continuation.resume() }
.store(in: &cancellable)
}
}
}
// Do something
DispatchQueue.concurrentPerform(iterations: 1_000_000) { _ in execute() }
This ensures one request at a time, and subsequent calls are waiting until finished. It seems this works but wondering if there's a pure Concurrency way instead of mixing Combine in, and how I can test this? Here's a test I started but I'm confused how to test this:
final class OperationStatusTests: XCTestCase {
private let iterations = 10_000 // 1_000_000
private let outerIterations = 10
actor Storage {
var counter: Int = 0
func increment() {
counter += 1
}
}
func testConcurrency() {
// Given
let storage = Storage()
let operation = OperationStatus()
let promise = expectation(description: "testConcurrency")
promise.expectedFulfillmentCount = outerIterations * iterations
#Sendable func execute() async {
guard await !operation.isExecuting else {
await operation.waitUntilFinished()
promise.fulfill()
return
}
await operation.set(isExecuting: true)
try? await Task.sleep(seconds: 8)
await storage.increment()
await operation.set(isExecuting: false)
promise.fulfill()
}
waitForExpectations(timeout: 10)
// When
DispatchQueue.concurrentPerform(iterations: outerIterations) { _ in
(0..<iterations).forEach { _ in
Task { await execute() }
}
}
// Then
// XCTAssertEqual... how to test?
}
}
Before I tackle a more general example, let us first dispense with some natural examples of sequential execution of asynchronous tasks, passing the result of one as a parameter of the next. Consider:
func entireProcess() async throws {
let value = try await first()
let value2 = try await subsequent(with: value)
let value3 = try await subsequent(with: value2)
let value4 = try await subsequent(with: value3)
// do something with `value4`
}
Or
func entireProcess() async throws {
var value = try await first()
for _ in 0 ..< 4 {
value = try await subsequent(with: value)
}
// do something with `value`
}
This is the easiest way to declare a series of async functions, each of which takes the prior result as the input for the next iteration. So, let us expand the above to include some signposts for Instruments’ “Points of Interest” tool:
import os.log
private let log = OSLog(subsystem: "Test", category: .pointsOfInterest)
func entireProcess() async throws {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
var value = try await first()
for _ in 0 ..< 4 {
os_signpost(.event, log: log, name: #function, "Scheduling: %d with input of %d", i, value)
value = try await subsequent(with: value)
}
os_signpost(.end, log: log, name: #function, signpostID: id, "%d", value)
}
func first() async throws -> Int {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
try await Task.sleep(seconds: 1)
let value = 42
os_signpost(.end, log: log, name: #function, signpostID: id, "%d", value)
return value
}
func subsequent(with value: Int) async throws -> Int {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "%d", value)
try await Task.sleep(seconds: 1)
let newValue = value + 1
defer { os_signpost(.end, log: log, name: #function, signpostID: id, "%d", newValue) }
return newValue
}
So, there you see a series of requests that pass their result to the subsequent request. All of that os_signpost signpost stuff is so we can visually see that they are running sequentially in Instrument’s “Points of Interest” tool:
You can see ⓢ event signposts as each task is scheduled, and the intervals illustrate the sequential execution of these asynchronous tasks.
This is the easiest way to have dependencies between tasks, passing values from one task to another.
Now, that begs the question is how to generalize the above, where we await the prior task before starting the next one.
One pattern is to write an actor that awaits the result of the prior one. Consider:
actor SerialTasks<Success> {
private var previousTask: Task<Success, Error>?
func add(block: #Sendable #escaping () async throws -> Success) {
previousTask = Task { [previousTask] in
let _ = await previousTask?.result
return try await block()
}
}
}
Unlike the previous example, this does not require that you have a single function from which you initiate the subsequent tasks. E.g., I have used the above when some separate user interaction requires me to add a new task to the end of the list of previously submitted tasks.
There are two subtle, yet critical, aspects of the above actor:
The add method, itself, must not be an asynchronous function. We need to avoid actor reentrancy. If this were an async function (like in your example), we would lose the sequential execution of the tasks.
The Task has a [previousTask] capture list to capture a copy of the prior task. This way, each task will await the prior one, avoiding any races.
The above can be used to make a series of tasks run sequentially. But it is not passing values between tasks, itself. I confess that I have used this pattern where I simply need sequential execution of largely independent tasks (e.g., sending separate commands being sent to some Process). But it can probably be adapted for your scenario, in which you want to “share its response with [subsequent requests]”.
I would suggest that you post a separate question with MCVE with a practical example of precisely what you wanted to pass from one asynchronous function to another. I have, for example, done permutation of the above, passing integer from one task to another. But in practice, that is not of great utility, as it gets more complicated when you start dealing with the reality of heterogenous results parsing. In practice, the simple example with which I started this question is the most practical pattern.
On the broader question of working with/around actor reentrancy, I would advise keeping an eye out on SE-0306 - Future Directions which explicitly contemplates some potential elegant forthcoming alternatives. I would not be surprised to see some refinements, either in the language itself, or in the Swift Async Algorithms library.
tl;dr
I did not want to encumber the above with discussion regarding your code snippets, but there are quite a few issues. So, if you forgive me, here are some observations:
The attempt to use OperationStatus to enforce sequential execution of async calls will not work because actors feature reentrancy. If you have an async function, every time you hit an await, that is a suspension point at which point another call to that async function is allowed to proceed. The integrity of your OperationStatus logic will be violated. You will not experience serial behavior.
If you are interesting in suspension points, I might recommend watching WWDC 2021 video Swift concurrency: Behind the scenes.
The testConcurrency is calling waitForExpectations before it actually starts any tasks that will fulfill any expectations. That will always timeout.
The testConcurrency is using GCD concurrentPerform, which, in turn, just schedules an asynchronous task and immediately returns. That defeats the entire purpose of concurrentPerform (which is a throttling mechanism for running a series of synchronous tasks in parallel, but not exceed the maximum number of cores on your CPU). Besides, Swift concurrency features its own analog to concurrentPerform, namely the constrained “cooperative thread pool” (also discussed in that video, IIRC), rendering concurrentPerform obsolete in the world of Swift concurrency.
Bottom line, it doesn't make sense to include concurrentPerform in a Swift concurrency codebase. It also does not make sense to use concurrentPerform to launch asynchronous tasks (whether Swift concurrency or GCD). It is for launching a series of synchronous tasks in parallel.
In execute in your test, you have two paths of execution, one which will await some state change and fulfills the expectation without ever incrementing the storage. That means that you will lose some attempts to increment the value. Your total will not match the desired resulting value. Now, if your intent was to drop requests if another was pending, that's fine. But I don't think that was your intent.
In answer to your question about how to test success at the end. You might do something like:
actor Storage {
private var counter: Int = 0
func increment() {
counter += 1
}
var value: Int { counter }
}
func testConcurrency() async {
let storage = Storage()
let operation = OperationStatus()
let promise = expectation(description: "testConcurrency")
let finalCount = outerIterations * iterations
promise.expectedFulfillmentCount = finalCount
#Sendable func execute() async {
guard await !operation.isExecuting else {
await operation.waitUntilFinished()
promise.fulfill()
return
}
await operation.set(isExecuting: true)
try? await Task.sleep(seconds: 1)
await storage.increment()
await operation.set(isExecuting: false)
promise.fulfill()
}
// waitForExpectations(timeout: 10) // this is not where you want to wait; moved below, after the tasks started
// DispatchQueue.concurrentPerform(iterations: outerIterations) { _ in // no point in this
for _ in 0 ..< outerIterations {
for _ in 0 ..< iterations {
Task { await execute() }
}
}
await waitForExpectations(timeout: 10)
// test the success to see if the store value was correct
let value = await storage.value // to test that you got the right count, fetch the value; note `await`, thus we need to make this an `async` test
// Then
XCTAssertEqual(finalCount, value, "Count")
}
Now, this test will fail for a variety of reasons, but hopefully this illustrates how you would verify the success or failure of the test. But, note, that this will test only that the final result was correct, not that they were executed sequentially. The fact that Storage is an actor will hide the fact that they were not really invoked sequentially. I.e., if you really needed the results of one request to prepare the next is not tested here.
If, as you go through this, you want to really confirm the behavior of your OperationStatus pattern, I would suggest using os_signpost intervals (or simple logging statements where your tasks start and finish). You will see that the separate invocations of the asynchronous execute method are not running sequentially.

GCD - asyncAfter: How to run it synchronously

I have just started reading swift and i'm currently confused of how to use threading correctly.
What i'm trying to achieve in the following block of code, is to execute the print statements inside the dispatchers, but i want to do it in order. The problem that i have, is that of course i want to do this in a background thread than the main since this is a long task, and at the same time to execute it in order while i'm giving delay in the execution. The current block executes each of the cases all together.
I have also took a look in Timer and Semaphores but without any results.
Any help or explanation of what i'm doing wrong or what should i approach will be appreciated.
let formattedSeries = ["a", "a", "b"]
let dispatchQueue = DispatchQueue(label: "taskQueue")
let a = 1000
let b = 5000
for (index, letter) in (formattedSeries.enumerated()){
switch letter {
case "a":
dispatchQueue.asyncAfter(deadline: .now() + .milliseconds(a), execute: {
print("a executed")
})
break
case "b":
dispatchQueue.asyncAfter(deadline: .now() + .milliseconds(b), execute: {
print("b executed")
})
break
default:
print("default")
}
}
You can use a dispatch group to force the evaluation of the next letter to wait for evaluation of the previous letter:
let dispatchGroup = DispatchGroup()
let dispatchQueue = DispatchQueue(label: "taskQueue")
let a = 1000
let b = 5000
let formattedSeries = "abbaabba"
print("start", Date().timeIntervalSince1970)
for (index, letter) in (formattedSeries.enumerated()){
dispatchGroup.enter()
switch letter {
case "a":
dispatchQueue.asyncAfter(deadline: .now() + .milliseconds(a), execute: {
print("a executed", Date().timeIntervalSince1970)
dispatchGroup.leave()
})
break
case "b":
dispatchQueue.asyncAfter(deadline: .now() + .milliseconds(b), execute: {
print("b executed", Date().timeIntervalSince1970)
dispatchGroup.leave()
})
break
default:
print("default")
}
dispatchGroup.wait()
}
I've added some extra output to prove that the intervals are correct. The output is
start 1580060250.3307471
a executed 1580060251.389974
b executed 1580060256.889923
b executed 1580060262.2758632
a executed 1580060263.372933
a executed 1580060264.373787
b executed 1580060269.37443
b executed 1580060274.375314
a executed 1580060275.4726748
which proves that we evaluated the letters in order, and that the async_after intervals elapse between the prints.
To execute the tasks in order you need an asynchronous operation.
Use the class AsynchronousOperation provided in this answer
Create a serial OperationQueue and set maxConcurrentOperationCount to 1
Subclass AsynchronousOperation and put the dispatchQueue.asyncAfter tasks into the main() method of the subclass. Call finish() in the closure (before or after print("a executed"))
Add the operations to the serial operation queue

Multiple workers in Swift Command Line Tool

When writing a Command Line Tool (CLT) in Swift, I want to process a lot of data. I've determined that my code is CPU bound and performance could benefit from using multiple cores. Thus I want to parallelize parts of the code. Say I want to achieve the following pseudo-code:
Fetch items from database
Divide items in X chunks
Process chunks in parallel
Wait for chunks to finish
Do some other processing (single-thread)
Now I've been using GCD, and a naive approach would look like this:
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
worker(chunk)
}
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
However GCD requires a run loop, so the code will hang as the group is never executed. The runloop can be started with dispatch_main(), but it never exits. It is also possible to run the NSRunLoop just a few seconds, however that doesn't feel like a solid solution. Regardless of GCD, how can this be achieved using Swift?
I mistakenly interpreted the locking thread for a hanging program. The work will execute just fine without a run loop. The code in the question will run fine, and blocking the main thread until the whole group has finished.
So say chunks contains 4 items of workload, the following code spins up 4 concurrent workers, and then waits for all of the workers to finish:
let group = DispatchGroup()
let queue = DispatchQueue(label: "", attributes: .concurrent)
for chunk in chunk {
queue.async(group: group, execute: DispatchWorkItem() {
do_work(chunk)
})
}
_ = group.wait(timeout: .distantFuture)
Just like with an Objective-C CLI, you can make your own run loop using NSRunLoop.
Here's one possible implementation, modeled from this gist:
class MainProcess {
var shouldExit = false
func start () {
// do your stuff here
// set shouldExit to true when you're done
}
}
println("Hello, World!")
var runLoop : NSRunLoop
var process : MainProcess
autoreleasepool {
runLoop = NSRunLoop.currentRunLoop()
process = MainProcess()
process.start()
while (!process.shouldExit && (runLoop.runMode(NSDefaultRunLoopMode, beforeDate: NSDate(timeIntervalSinceNow: 2)))) {
// do nothing
}
}
As Martin points out, you can use NSDate.distantFuture() as NSDate instead of NSDate(timeIntervalSinceNow: 2). (The cast is necessary because the distantFuture() method signature indicates it returns AnyObject.)
If you need to access CLI arguments see this answer. You can also return exit codes using exit().
Swift 3 minimal implementation of Aaron Brager solution, which simply combines autoreleasepool and RunLoop.current.run(...) until you break the loop:
var shouldExit = false
doSomethingAsync() { _ in
defer {
shouldExit = true
}
}
autoreleasepool {
var runLoop = RunLoop.current
while (!shouldExit && (runLoop.run(mode: .defaultRunLoopMode, before: Date.distantFuture))) {}
}
I think CFRunLoop is much easier than NSRunLoop in this case
func main() {
/**** YOUR CODE START **/
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
worker(chunk)
}
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
/**** END **/
}
let runloop = CFRunLoopGetCurrent()
CFRunLoopPerformBlock(runloop, kCFRunLoopDefaultMode) { () -> Void in
dispatch_async(dispatch_queue_create("main", nil)) {
main()
CFRunLoopStop(runloop)
}
}
CFRunLoopRun()