Task #Sendable operation - swift

Writing a simple code:
class App {
private var value = 0
func start() async throws {
await withTaskGroup(of: Void.self) { group in
for _ in 1...100 {
group.addTask(operation: self.increment) // 1
group.addTask {
await self.increment() // 2
}
group.addTask {
self.value += 1 // 3
}
}
}
}
#Sendable private func increment() async {
self.value += 1 // 4
}
}
I got compile time warnings at lines 2, 3:
Capture of 'self' with non-sendable type 'App' in a #Sendable closure
However with enabled Thread Sanitizer, and removed lines 2 and 3, I got ThreadSanitizer runtime warning at line 4:
Swift access race in (1) suspend resume partial function for asyncTest.App.increment#Sendable () async -> () at 0x106003c80
So I have questions:
What is the difference between using these 3 .addTask ways?
What does #Sendable attribute does?
How can I make increment() function thread safe (data race free)?

For illustration of how to achieve thread-safety, consider:
class Counter {
private var value = 0
func incrementManyTimes() async {
await withTaskGroup(of: Void.self) { group in
for _ in 1...1_000_000 {
group.addTask {
self.increment() // no `await` as this is not `async` method
}
}
await group.waitForAll()
if value != 1_000_000 {
print("not thread-safe apparently; value =", value) // not thread-safe apparently; value = 994098
} else {
print("ok")
}
}
}
private func increment() { // note, this isn't `async` (as there is no `await` suspension point in here)
value += 1
}
}
That illustrates that it is not thread-safe, validating the warning from TSAN. (Note, I bumped the iteration count to make it easier to manifest the symptoms of non-thread-safe code.)
So, how would you make it thread-safe? Use an actor:
actor Counter {
private var value = 0
func incrementManyTimes() async {
await withTaskGroup(of: Void.self) { group in
for _ in 1...1_000_000 {
group.addTask {
await self.increment()
}
}
await group.waitForAll()
if value != 1_000_000 {
print("not thread-safe apparently; value =", value)
} else {
print("ok") // ok
}
}
}
private func increment() { // note, this still isn't `async`
value += 1
}
}
If you really want to use a class, add your own old-school synchronization. Here I am using a lock, but you could use a serial GCD queue, or whatever you want.
class Counter {
private var value = 0
let lock = NSLock()
func incrementManyTimes() async {
await withTaskGroup(of: Void.self) { group in
for _ in 1...1_000_000 {
group.addTask {
self.increment()
}
}
await group.waitForAll()
if value != 1_000_000 {
print("not thread-safe apparently; value =", value)
} else {
print("ok") // ok
}
}
}
private func increment() {
lock.synchronize {
value += 1
}
}
}
extension NSLocking {
func synchronize<T>(block: () throws -> T) rethrows -> T {
lock()
defer { unlock() }
return try block()
}
}
Or, if you want to make Counter a Sendable type, too, confident that you are properly doing the synchronization yourself, like above, you can declare it as final and declare that it is Sendable, admittedly #unchecked by the compiler:
final class Counter: #unchecked Sendable {
private var value = 0
let lock = NSLock()
func incrementManyTimes() async {
// same as above
}
private func increment() {
lock.synchronize {
value += 1
}
}
}
But because the compiler cannot possibly reason about the actual “sendability” itself, you have to designate this as a #unchecked Sendable to let it know that you personally have verified it is really Sendable.
But actor is the preferred mechanism for ensuring thread-safety, eliminating the need for this custom synchronization logic.
For more information, see WWDC 2021 video Protect mutable state with Swift actors or 2022’s Eliminate data races using Swift Concurrency.
See Sendable documentation for a discussion of what the #Sendable attribute does.
BTW, I suspect you know this, but for the sake of future readers, while increment is fine for illustrative purposes, it is not a good candidate for parallelism. This parallelized rendition is actually slower than a simple, single-threaded solution. To achieve performance gains of parallelism, you need to have enough work on each thread to justify the modest overhead that parallelism entails.
Also, when testing parallelism, be wary of using the simulator, which significantly/artificially constrains the cooperative thread pool used by Swift concurrency. Test on macOS target, or a physical device. Neither the simulator nor a playground is a good testbed for this sort of parallelism exercise.

Related

Unit testing a class that depends on an async function

I have a view model with a state property enum that has 3 cases.
protocol ServiceType {
func doSomething() async
}
#MainActor
final class ViewModel {
enum State {
case notLoaded
case loading
case loaded
}
private let service: ServiceType
var state: State = .notLoaded
init(service: ServiceType) {
self.service = service
}
func load() async {
state = .loading
await service.doSomething()
state = .loaded
}
}
I want to write a unit test that asserts that after load is called but before the async function returns, state == .loading .
If I was using completion handlers, I could create a spy that implements ServiceType, captures that completion handler but doesn't call it. If I was using combine I could use a schedular to control execution.
Is there an equivalent solution when using Swift's new concurrency model?
As you're injecting the depencency via a protocol, you're in a very good position for providing a Fake for that protocol, a fake which you have full control from the unit tests:
class ServiceFake: ServiceType {
var doSomethingReply: (CheckedContinuation<Void, Error>) -> Void = { _ in }
func doSomething() async {
// this creates a continuation, but does nothing with it
// as it waits for the owners to instruct how to proceed
await withCheckedContinuation { doSomethingReply($0) }
}
}
With the above in place, your unit tests are in full control: they know when/if doSomething was called, and can instruct how the function should respond.
final class ViewModelTests: XCTestCase {
func test_viewModelIsLoadingWhileDoSomethingSuspends() {
let serviceFake = ServiceFake()
let viewModel = ViewModel(service: serviceFake)
XCTAssertEquals(viewModel.state, .notLoaded)
let expectation = XCTestExpectation(description: "doSomething() was called")
// just fulfilling the expectation, because we ignore the continuation
// the execution of `load()` will not pass the `doSomething()` call
serviceFake.doSomethingReply = { _ in
expectation.fulfill()
}
Task {
viewModel.load()
}
wait(for: [expectation], timeout: 0.1)
XCTAssertEqual(viewModel.state, .loading)
}
}
The above test makes sure doSomething() is called, as you likely don't want to validate the view model state until you're sure the execution of load() reached the expected place - afterall, load() is called on a different thread, so we need an expectation to make sure the test properly waits until the thread execution reaches the expected point.
The above technique is very similar to a mock/stub, where the implementation is replaced with a unit-test provided one. You could even go further, and just have an async closure instead of a continuation-based one:
class ServiceFake: ServiceType {
var doSomethingReply: () async -> Void = { }
func doSomething() async {
doSomethingReply()
}
}
, and while this would give even greater control in the unit tests, it also pushes the burden of creating the continuations on those unit tests.
You can handle this the similar way you were handling for completion handler, you have the choice to either delay the completion of doSomething using Task.sleep(nanoseconds:) or you can use continuation to block the execution forever by not resuming it same as you are doing with completion handler.
So your mock ServiceType for the delay test scenario looks like:
struct HangingSevice: ServiceType {
func doSomething() async {
let seconds: UInt64 = 1 // Delay by seconds
try? await Task.sleep(nanoseconds: seconds * 1_000_000_000)
}
}
Or for the forever suspended scenario:
class HangingSevice: ServiceType {
private var continuation: CheckedContinuation<Void, Never>?
deinit {
continuation?.resume()
}
func doSomething() async {
let seconds: UInt64 = 1 // Delay by seconds
await withCheckedContinuation { continuation in
self.continuation?.resume()
self.continuation = continuation
}
}
}

Swift MetalKit buffer completion handler vs waitForCompletion?

When creating a render pass in MetalKit is it better in terms of performance to wait for completion or to add a completion handler? If I use the completion handler then I'll end up with a lot of nested closures, but I think waitForCompletion might block a thread. If the completion handler is preferred, is there a better way in Swift to do this without having to use so many nested closures?
For example,
buffer.addCompletionHandler { _ in
... next task
buffer2.addCompletionHandler { _ in
... etc etc
}
}
The other people are right in telling you that this is probably not what you want to do, and you should go educate yourself on how others have created render loops in Metal.
That said, if you actually have use cases for non-blocking versions of waitUntilCompleted or waitUntilScheduled, you can create and use your own until Apple gets around to providing the same.
public extension MTLCommandBuffer {
/// Wait until this command buffer is scheduled for execution on the GPU.
var schedulingCompletion: Void {
get async {
await withUnsafeContinuation { continuation in
addScheduledHandler { _ in
continuation.resume()
}
}
}
}
/// Wait until the GPU has finished executing the commands in this buffer.
var completion: Void {
get async {
await withUnsafeContinuation { continuation in
addCompletedHandler { _ in
continuation.resume()
}
}
}
}
}
But I doubt those properties will improve any code, as all of the task ordering code, necessary to ensure that the "handlers" are added before commit is called, is worse than the callbacks.
let string: String = await withTaskGroup(of: String.self) { group in
let buffer = MTLCreateSystemDefaultDevice()!.makeCommandQueue()!.makeCommandBuffer()!
group.addTask {
await buffer.schedulingCompletion
return "2"
}
group.addTask {
await buffer.completion
return "3"
}
group.addTask {
buffer.commit()
return "1"
}
return await .init(group)
}
XCTAssertEqual(string, "123")
public extension String {
init<Strings: AsyncSequence>(_ strings: Strings) async rethrows
where Strings.Element == String {
self = try await strings.reduce(into: .init()) { $0.append($1) }
}
}
However, while I'm unconvinced on addScheduledHandler being improvable, I think pairing addCompletedHandler and commit has more potential.
public extension MTLCommandBuffer {
/// Commit this buffer and wait for the GPU to finish executing its commands.
func complete() async {
await withUnsafeContinuation { continuation in
self.addCompletedHandler { _ in
continuation.resume()
}
commit()
} as Void
}
}
You are supposed to use MTLCommandQueues and MTLEvents for serializing your GPU work, not completion handlers. Completion handlers are meant to be used only in cases where you need CPU-GPU synchronization. e.g. when you need to read back a result of GPU calculation on a CPU, or you need to add a back pressure, like for example when you only want to draw a certain amount of frame concurrently.

Cannot find in scope in swift in DispatchQueue.main.async

I have this code in SWift:
import UIKit
import Foundation
func whatever() async -> Int{
return 2;
}
func test(){
Task.init{
var testing:Int = 0
do {
testing = try await whatever()
}
catch {
print("some error happened")
}
}
DispatchQueue.main.async {
print("from dispa")
if(testing == 0){
print("testing was never set")
}
}
}
test()
It is a playground and upon running it, I get Cannot find 'testing' in scope in the if statement in the DispatchQueue
What am I doing wrong? Moving the code within the Task.init throws a different error: Reference to captured var 'testing' in concurrently-executing code
The variable testing is declared inside the Task.init scope right now, meaning the DispatchQueue closure has no access to it. To use it in both closures, you must move it to a scope in which both closures have access to it (ie a shared parent scope).
Once you've done that, I'll say it's a little unclear on what exactly you want happening here, but I'm guessing this is the result you are looking for. Note that you should be careful about mixing Tasks and DispatchQueues -- they are different paradigms and do not necessarily lead to the results that you think they would.
var testing = 0
func whatever() async throws -> Int {
return 2 //note that nothing async or throwing is actually happening here
}
func test(){
Task {
do {
testing = try await whatever()
}
catch {
print("some error happened")
}
}
DispatchQueue.main.async {
print("from dispatch")
if testing == 0 {
print("testing was never set")
}
}
}

Concurrently run async tasks with unnamed async let

With Swift concurrency, is it possible to have something almost like an 'unnamed' async let?
Here is an example. You have the following actor:
actor MyActor {
private var foo: Int = 0
private var bar: Int = 0
func setFoo(to value: Int) async {
foo = value
}
func setBar(to value: Int) async {
bar = value
}
func printResult() {
print("foo =", foo)
print("bar =", bar)
}
}
Now I want to set foo and bar using the given methods. Simple usage would look like the following:
let myActor = MyActor()
await myActor.setFoo(to: 5)
await myActor.setBar(to: 10)
await myActor.printResult()
However this code is sequentially run. For all intents and purposes, assume setFoo(to:) and setBar(to:) may be a long running task. You can also assume the methods are mutually exclusive (don't share variables & won't affect each other).
To make this code current, async let can be used. However, this just starts the tasks until they are awaited later on. In my example you'll notice I don't need the return value from these methods. All I need is that before printResult() is called, the previous tasks have completed.
I could come up with the following:
let myActor = MyActor()
async let tempFoo: Void = myActor.setFoo(to: 5)
async let tempBar: Void = myActor.setBar(to: 10)
let _ = await [tempFoo, tempBar]
await myActor.printResult()
Explicitly creating these tasks and then awaiting an array of them seems incorrect. Is this really the best way?
This can be achieved with a task group using withTaskGroup(of:returning:body:). The method calls are individual tasks, and then we await waitForAll() which continues when all tasks have completed.
Code:
await withTaskGroup(of: Void.self) { group in
let myActor = MyActor()
group.addTask {
await myActor.setFoo(to: 5)
}
group.addTask {
await myActor.setBar(to: 10)
}
await group.waitForAll()
await myActor.printResult()
}
I made your actor a class to allow concurrent execution of the two methods.
import Foundation
final class Jeep {
private var foo: Int = 0
private var bar: Int = 0
func setFoo(to value: Int) {
print("begin foo")
foo = value
sleep(1)
print("end foo \(value)")
}
func setBar(to value: Int) {
print("begin bar")
bar = value
sleep(2)
print("end bar \(bar)")
}
func printResult() {
print("printResult foo:\(foo), bar:\(bar)")
}
}
let jeep = Jeep()
let blocks = [
{ jeep.setFoo(to: 1) },
{ jeep.setBar(to: 2) },
]
// ...WORK
RunLoop.current.run(mode: RunLoop.Mode.default, before: NSDate(timeIntervalSinceNow: 5) as Date)
Replace WORK with one of these:
// no concurrency, ordered execution
for block in blocks {
block()
}
jeep.printResult()
// concurrency, unordered execution, tasks created upfront programmatically
Task {
async let foo: Void = blocks[0]()
async let bar: Void = blocks[1]()
await [foo, bar]
jeep.printResult()
}
// concurrency, unordered execution, tasks created upfront, but started by the system (I think)
Task {
await withTaskGroup(of: Void.self) { group in
for block in blocks {
group.addTask { block() }
}
}
// when the initialization closure exits, all child tasks are awaited implicitly
jeep.printResult()
}
// concurrency, unordered execution, awaited in order
Task {
let tasks = blocks.map { block in
Task { block() }
}
for task in tasks {
await task.value
}
jeep.printResult()
}
// tasks created upfront, all tasks start concurrently, produce result as soon as they finish
let stream = AsyncStream<Void> { continuation in
Task {
let tasks = blocks.map { block in
Task { block() }
}
for task in tasks {
continuation.yield(await task.value)
}
continuation.finish()
}
}
Task {
// now waiting for all values, bad use of a stream, I know
for await value in stream {
print(value as Any)
}
jeep.printResult()
}

Use queue and semaphore for concurrency and property wrapper?

I'm trying to create a thread-safe property wrapper. I could only think of GCD queues and semaphores as being the most Swifty and reliable way. Are semaphore's just more performant (if that's true), or is there another reason to use one over the other for concurrency?
Below are two variants of atomic property wrappers:
#propertyWrapper
struct Atomic<Value> {
private var value: Value
private let queue = DispatchQueue(label: "Atomic serial queue")
var wrappedValue: Value {
get { queue.sync { value } }
set { queue.sync { value = newValue } }
}
init(wrappedValue value: Value) {
self.value = value
}
}
#propertyWrapper
struct Atomic2<Value> {
private var value: Value
private var semaphore = DispatchSemaphore(value: 1)
var wrappedValue: Value {
get {
semaphore.wait()
let temp = value
semaphore.signal()
return temp
}
set {
semaphore.wait()
value = newValue
semaphore.signal()
}
}
init(wrappedValue value: Value) {
self.value = value
}
}
struct MyStruct {
#Atomic var counter = 0
#Atomic2 var counter2 = 0
}
func test() {
var myStruct = MyStruct()
DispatchQueue.concurrentPerform(iterations: 1000) {
myStruct.counter += $0
myStruct.counter2 += $0
}
}
How can they be properly tested and measured to see the difference between the two implementations and if they even work?
FWIW, another option is reader-writer pattern with concurrent queue, where reads are done synchronously, but are allowed to run concurrently with respect to other reads, but writes are done asynchronously, but with a barrier (i.e. not concurrently with respect to any other reads or writes):
#propertyWrapper
class Atomic<Value> {
private var value: Value
private let queue = DispatchQueue(label: "com.domain.app.atomic", attributes: .concurrent)
var wrappedValue: Value {
get { queue.sync { value } }
set { queue.async(flags: .barrier) { self.value = newValue } }
}
init(wrappedValue value: Value) {
self.value = value
}
}
Yet another is NSLock:
#propertyWrapper
class Atomic<Value> {
private var value: Value
private var lock = NSLock()
var wrappedValue: Value {
get { lock.synchronized { value } }
set { lock.synchronized { value = newValue } }
}
init(wrappedValue value: Value) {
self.value = value
}
}
where
extension NSLocking {
func synchronized<T>(block: () throws -> T) rethrows -> T {
lock()
defer { unlock() }
return try block()
}
}
Or you can use unfair locks:
#propertyWrapper
class SynchronizedUnfairLock<Value> {
private var value: Value
private var lock = UnfairLock()
var wrappedValue: Value {
get { lock.synchronized { value } }
set { lock.synchronized { value = newValue } }
}
init(wrappedValue value: Value) {
self.value = value
}
}
Where
// One should not use `os_unfair_lock` directly in Swift (because Swift
// can move `struct` types), so we'll wrap it in a `UnsafeMutablePointer`.
// See https://github.com/apple/swift/blob/88b093e9d77d6201935a2c2fb13f27d961836777/stdlib/public/Darwin/Foundation/Publishers%2BLocking.swift#L18
// for stdlib example of this pattern.
final class UnfairLock: NSLocking {
private let unfairLock: UnsafeMutablePointer<os_unfair_lock> = {
let pointer = UnsafeMutablePointer<os_unfair_lock>.allocate(capacity: 1)
pointer.initialize(to: os_unfair_lock())
return pointer
}()
deinit {
unfairLock.deinitialize(count: 1)
unfairLock.deallocate()
}
func lock() {
os_unfair_lock_lock(unfairLock)
}
func tryLock() -> Bool {
os_unfair_lock_trylock(unfairLock)
}
func unlock() {
os_unfair_lock_unlock(unfairLock)
}
}
We should recognize that while these, and yours, offers atomicity, you have to be careful because, depending upon how you use it, it may not be thread-safe.
Consider this simple experiment, where we increment an integer a million times:
func threadSafetyExperiment() {
#Atomic var foo = 0
DispatchQueue.global().async {
DispatchQueue.concurrentPerform(iterations: 10_000_000) { _ in
foo += 1
}
print(foo)
}
}
You’d expect foo to be equal to 10,000,000, but it won’t be. That is because the whole interaction of “retrieve the value and increment it and save it” needs to be wrapped in a single synchronization mechanism.
But you can add an atomic increment method:
extension Atomic where Value: Numeric {
mutating func increment(by increment: Value) {
lock.synchronized { value += increment }
}
}
And then this works fine:
func threadSafetyExperiment() {
#Atomic var foo = 0
DispatchQueue.global().async {
DispatchQueue.concurrentPerform(iterations: iterations) { _ in
_foo.increment(by: 1)
}
print(foo)
}
}
How can they be properly tested and measured to see the difference between the two implementations and if they even work?
A few thoughts:
I’d suggest doing far more than 1,000 iterations. You want to do enough iterations that the results are measured in seconds, not milliseconds. I used ten million iterations in my example.
The unit testing framework is ideal at both testing for correctness as well as measuring performance using the measure method (which repeats the performance test 10 times for each unit test and the results will be captured by the unit test reports):
So, create a project with a unit test target (or add a unit test target to existing project if you want) and then create unit tests, and execute them with command+u.
If you edit the scheme for your target, you can choose to randomize the order of your tests, to make sure the order in which they execute doesn’t affect the performance:
I would also make the test target use a release build to make sure you’re testing an optimized build.
Needless to say, while I am stress testing the locks by running 10m iterations, incrementing by one for each iteration, that is horribly inefficient. There simply is not enough work on each thread to justify the overhead of the thread handling. One would generally stride through the data set and do more iterations per thread, and reducing the number of synchronizations.
The practical implication of this, is that in well designed parallel algorithm, where you are doing enough work to justify the multiple threads, you are reducing the number of synchronizations that are taking place. Thus, the minor variances in the different synchronization techniques are unobservable. If the synchronization mechanism is having an observable performance difference, this probably suggests a deeper problem in the parallelization algorithm. Focus on reducing synchronizations, not making synchronizations faster.