Multiple workers in Swift Command Line Tool

Multiple workers in Swift Command Line Tool - swift

When writing a Command Line Tool (CLT) in Swift, I want to process a lot of data. I've determined that my code is CPU bound and performance could benefit from using multiple cores. Thus I want to parallelize parts of the code. Say I want to achieve the following pseudo-code:
Fetch items from database
Divide items in X chunks
Process chunks in parallel
Wait for chunks to finish
Do some other processing (single-thread)
Now I've been using GCD, and a naive approach would look like this:
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
worker(chunk)
}
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
However GCD requires a run loop, so the code will hang as the group is never executed. The runloop can be started with dispatch_main(), but it never exits. It is also possible to run the NSRunLoop just a few seconds, however that doesn't feel like a solid solution. Regardless of GCD, how can this be achieved using Swift?

I mistakenly interpreted the locking thread for a hanging program. The work will execute just fine without a run loop. The code in the question will run fine, and blocking the main thread until the whole group has finished.
So say chunks contains 4 items of workload, the following code spins up 4 concurrent workers, and then waits for all of the workers to finish:
let group = DispatchGroup()
let queue = DispatchQueue(label: "", attributes: .concurrent)
for chunk in chunk {
queue.async(group: group, execute: DispatchWorkItem() {
do_work(chunk)
})
}
_ = group.wait(timeout: .distantFuture)

Just like with an Objective-C CLI, you can make your own run loop using NSRunLoop.
Here's one possible implementation, modeled from this gist:
class MainProcess {
var shouldExit = false
func start () {
// do your stuff here
// set shouldExit to true when you're done
}
}
println("Hello, World!")
var runLoop : NSRunLoop
var process : MainProcess
autoreleasepool {
runLoop = NSRunLoop.currentRunLoop()
process = MainProcess()
process.start()
while (!process.shouldExit && (runLoop.runMode(NSDefaultRunLoopMode, beforeDate: NSDate(timeIntervalSinceNow: 2)))) {
// do nothing
}
}
As Martin points out, you can use NSDate.distantFuture() as NSDate instead of NSDate(timeIntervalSinceNow: 2). (The cast is necessary because the distantFuture() method signature indicates it returns AnyObject.)
If you need to access CLI arguments see this answer. You can also return exit codes using exit().

Swift 3 minimal implementation of Aaron Brager solution, which simply combines autoreleasepool and RunLoop.current.run(...) until you break the loop:
var shouldExit = false
doSomethingAsync() { _ in
defer {
shouldExit = true
}
}
autoreleasepool {
var runLoop = RunLoop.current
while (!shouldExit && (runLoop.run(mode: .defaultRunLoopMode, before: Date.distantFuture))) {}
}

I think CFRunLoop is much easier than NSRunLoop in this case
func main() {
/**** YOUR CODE START **/
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
worker(chunk)
}
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
/**** END **/
}
let runloop = CFRunLoopGetCurrent()
CFRunLoopPerformBlock(runloop, kCFRunLoopDefaultMode) { () -> Void in
dispatch_async(dispatch_queue_create("main", nil)) {
main()
CFRunLoopStop(runloop)
}
}
CFRunLoopRun()

Related

How to get concurrency when using AsyncLines

I'm trying to use AsyncLineSequence with Process to execute many instances of a shell script at the same time. The issue I'm seeing is that with my usage of AsyncLineSequence I'm not seeing the output of the Process invocations interweaved like I would expect. It feels like there is something fundamental I am misunderstanding as this seems like it should work to me.
Here's a reproduction in a playground
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
func run(label: String) throws {
let process = Process()
process.executableURL = URL(fileURLWithPath: "/usr/bin/yes")
let pipe = Pipe()
process.standardOutput = pipe
Task {
for try await _ in pipe.fileHandleForReading.bytes.lines {
print(label)
}
}
try process.run()
}
Task {
try run(label: "a")
}
Task {
try run(label: "b")
}
The above will print only a or b but never both. If I change to not use AsyncLineSequence like this
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
func run(label: String) throws {
let process = Process()
process.executableURL = URL(fileURLWithPath: "/usr/bin/yes")
let pipe = Pipe()
process.standardOutput = pipe
pipe.fileHandleForReading.readabilityHandler = { _ in
print(label)
}
try process.run()
}
Task {
try run(label: "a")
}
Task {
try run(label: "b")
}
The as and bs are both printed interleaved.
To add to my confusion if I use URLSession to get async lines by reading an arbitrary file it does interleave the print statements of a and b as I'd expect
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
Task {
for try await _ in try await URLSession.shared.bytes(from: URL(fileURLWithPath: "/usr/bin/yes")).0.lines {
print("a")
}
}
Task {
for try await _ in try await URLSession.shared.bytes(from: URL(fileURLWithPath: "/usr/bin/yes")).0.lines {
print("b")
}
}
If I replace URLSession for FileHandle in the above then I am back to no interleaving and all of one file is read followed by the next
import Cocoa
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
exit(EXIT_SUCCESS)
}
Task {
for try await _ in try FileHandle(forReadingFrom: URL(fileURLWithPath: "/usr/bin/yes")).bytes.lines {
print("a")
}
}
Task {
for try await _ in try FileHandle(forReadingFrom: URL(fileURLWithPath: "/usr/bin/yes")).bytes.lines {
print("b")
}
}

When I did this (10 seconds rather than 2 seconds, and in an app rather than a Playground), I do see them jumping back and forth.
Admittedly, it was not one-for-one interleaving (it was lots of “a”s followed by lots of “b”s, and then the process repeats). But there is no reason it would interleave perfectly one-for-one between the two processes, because while lines emits an asynchronous sequence of lines, behind the scenes it is likely reading chunks of output from the pipe, not really consuming it line by line, which would be very inefficient. (And, IMHO, it’s interesting that the URLSession behavior is different, but not terribly surprising.) And you effectively have two processes racing, so there is no reason to expect a graceful, alternating, behavior between the two.
If you replace yes with a program that waits a little between lines of output (e.g., I had it wait for 0.01 seconds between each line of output), then you will see it interleave a bit more frequently. Or when I added an actor to keep track which process last emitted a line of output, that was enough to trigger an immediate back-and-forth processing of one line from each yes output alternatively.
You might also want to consider the implication of running these two loops with Task { ... }, as that will run each “operation asynchronously as part of a new top-level task on behalf of the current actor” [emphasis added]. You might consider detached tasks or separate actors (to reduce contention on the current actor handling both loops). In my tests, it did not change the results too dramatically, but your mileage may vary. Regardless, it is something to be aware of.

Synchronize nested async network requests inside a while loop by using Semaphores

I have a func that gets a list of Players. When i fetch the players i need only to show those who belongs to the current Team so i am showing only a subset of the original list by filtering them. I don't know in advance, before making the request, how much players belong to the Team selected by the User, so i may need to do additional requests until i can display on the TableView at least 10 rows of Players. The User by pulling up from the bottom of the TableView can request more players to display. To do this i am calling a first async func request which in turn calls, inside a while, another nested async func request. Here a code to give you an idea of what i am trying to do:
let semaphore = DispatchSemaphore(value: 0)
func getTeamPlayersRequest() {
service.getTeamPlayers(...)
{
(result) in
switch result
{
case .success(let playersModel):
if let validCurrentPage = currentPageTmp ,
let validTotalPages = totalPagesTmp ,
let validNextPage = self.getTeamPlayersListNextPage()
{
while self.playersToShowTemp.count < 10 && self.currentPage < validTotalPages
{
self.currentPage = validNextPage //global var
self.fetchMorePlayers()
self.semaphore.wait() //global semaphore
}
}
case .failure(let error):
//some code...
}
})
}
private func fetchMorePlayers(){
// Completion handler of the following function is never called..
service.getTeamPlayers(requestedPage: currentPage, completion: {
(result) in
switch result
{
case .success(let playersModel):
if let validPlayerList = playersList,
let validPlayerListData = validPlayerList.data,
let validTeamModel = self.teamPlayerModel,
let validNextPage = self.getTeamPlayersListNextPage()
{
for player in validPlayerListData
{
if ( validTeamModel.id == player.team?.id)
{
self.playersToShowTemp.append(player)
}
}
}
self.currentPage = validNextPage
self.semaphore.signal() //global semaphore
case .failure(let error):
//some code...
}
}
}
I have tried both with DispatchGroup and Semaphore but i don't get it what i am doing wrong. I debugged the code and saw that the first async call get executed in a different queue (not the main queue) and a different thread. The nested async call getexecuted on a different thread but i don't know if it's the same concurrent queue of the first async call.
The completion handler of thenested call it's never called. Does anyone know why? is the self.semaphore.wait(), even if it get executed after the fetchMorePlayers() return, blocking/preventing the nested async completion handler to be called?
I am noticing through the Debugger that the completion() in the Xcode vars window has the note "swift partial apply forwarder for closure #1"

If we inline the function call in your loop, it looks something like this:
while self.playersToShowTemp.count < 10 && self.currentPage < validTotalPages
{
self.currentPage = validNextPage //global var
nbaService.getTeamPlayers(requestedPage: currentPage, completion: { ... })
self.semaphore.wait() //global semaphore
}
So nbaService.getTeamPlayers schedules a request, probably on the main DispatchQueue and immediately returns. Then you call wait on your semaphore, which blocks, probably before GCD even tries to run the task scheduled by nbaService.getTeamPlayers.
That's a problem on DispatchQueue.main, which is a serial queue. It has to be a serial queue for UI updates to work. What normally happens is on some iteration of the run loop you make a request, and return.. that bubbles back up to the run loop, which checks for more events and queued tasks. In this case, when your completion handler in getTeamPlayersRequest is waiting to be run, the run loop (via GCD) executes it for that iteration. Then you block the main thread, so the run loop can't continue. If you do need to block always do it on a different DispatchQueue, preferably a .concurrent one.
There is sometimes confusion about what .async does. It only means "run this later and right now return control back to the caller". That's all. It does not guarantee that your closure will run concurrently. It merely schedules it to be run later (possibly soon) on whatever DispatchQueue you called it on. If that queue is a serial queue, then it will be queued to run in its turn in that dispatch queue's run loop. If it's a concurrent queue (ie one you specifically set the attributes to include .concurrent). Then it will run, possibly at the same time as other tasks on that same DispatchQueue.
To avoid that instead of using a loop you can use async-chaining.
private func fetchMorePlayers(while condition: #autoclosure #escaping () -> Bool){
guard condition() else { return }
nbaService.getTeamPlayers(requestedPage: currentPage, completion: {
(result) in
switch result
{
case .success(let playersModel):
if let validPlayerList = playersList,
let validPlayerListData = validPlayerList.data,
let validTeamModel = self.teamPlayerModel,
let validNextPage = self.getTeamPlayersListNextPage()
{
for player in validPlayerListData
{
if ( validTeamModel.id == player.team?.id)
{
self.playersToShowTemp.append(player)
}
}
}
self.currentPage = validNextPage
// Chain to next call
self.fetchMorePlayers(while: condition))
case .failure(let error):
//some code...
}
}
}
Then in getTeamPlayersRequest you can do this:
func getTeamPlayersRequest() {
service.getTeamPlayers(...)
{
(result) in
switch result
{
case .success(let playersModel):
if let validCurrentPage = currentPageTmp ,
let validTotalPages = totalPagesTmp ,
let validNextPage = self.getTeamPlayersListNextPage()
{
self.currentPage = validNextPage //global var
self.fetchMorePlayers(while: self.playersToShowTemp.count < 10 && self.currentPage < validTotalPages)
}
case .failure(let error):
//some code...
}
})
}
This avoids the need to block on a semaphore, because each subsequent request happens in the completion handler of the previously completed one. The only issue is if you need for the completion handler in getTeamPlayersRequest to block while the fetchMorePlayers requests are being fetched, because now it won't you can re-introduce the semaphore. In that case the guard statement in fetchMorePlayers becomes:
guard condition() else
{
self.semaphore.signal()
return
}
That way it only signals on the last completion handler in the chain. You may need to block in a different DispatchQueue though. I think if you need to block, you probably have something about your design that needs to be reconsidered.

If you find yourself reaching for semaphores, it is almost always a mistake. Semaphores are inefficient at best, and introduce deadlock risks if misused. Semaphores should generally be avoided. (Don't get me wrong: Semaphores can be useful in some very narrow use cases, but this is not one of them.)
Use asynchronous patterns. One simple approach might be to recursively call the routine, calling the completion handler when done:
func startFetching(#escaping completion: () -> Void) {
fetchPlayers(page: 0, completion: completion)
}
private func fetchPlayers(page: Int, #escaping completion: () -> Void) {
// prepare request
// now perform request
performRequest(...) { ...
if let error = error {
completion()
return
}
...
if doesNeedMorePlayers {
fetchPlayers(page: page + 1, completion: completion)
} else {
completion()
}
}
}
Personally, I might probably add another closure to emit the players retrieved as we go along, e.g. like, if not actually, a Combine Publisher. Or if you want to update the UI all at once at the very end, just pass the players retrieved thus far as additional parameter in this recursive routine and pass the whole array back in the completion handler. But avoid globals or other state properties.
But the broader idea is to scrupulously avoid semaphores and instead embrace asynchronous patterns.

How to have a timeout when using DispatchGroup?

We can use a DispatchGroup in swift to add many tasks to the group. The group will wait until all the tasks are complete before proceeding to the next codes.
let dg = DispatchGroup()
dg.notify(queue: .global()) {
// run code here on completion
}
Is there a way to add a timeout for this dispatchGroup in case the tasks are taking too long to complete?
[Edit]
I am aware that DispatchGroup.wait(timeout:) adds a timeout. But this makes it synchronously wait. Is there a way that it is asynchronous using the notify method, but still have a timeout?

If you want the DispatchGroup to have a timeout asynchronously try:
var dg: DispatchGroup? = DispatchGroup()
DispatchQueue.main.asyncAfter(deadline: .now() + 10) {
self.dg = nil // after time out group is removed asynchronously
}
dg?.notify(queue: .global()) {
// run code here on completion
}

Just to add an alternative way; to easily reuse this in multiple parts of your code, you could provide a DispatchGroup extension like this:
extension DispatchGroup {
func notify(queue: DispatchQueue, timeout: TimeInterval, execute work: #escaping () -> ()) {
var selfDestructingCompletion: (() -> ())?
selfDestructingCompletion = {
selfDestructingCompletion = nil
work()
}
self.notify(queue: queue) {
selfDestructingCompletion?()
}
queue.asyncAfter(deadline: .now() + timeout) {
selfDestructingCompletion?()
}
}
}
Which then you would just use like this:
// 1. something you need to have for the dispatch group anyway
let someSerialQueue = DispatchQueue(label: "someQueue")
let group = DispatchGroup()
for someApi in apiList {
group.enter()
someApi.call() {
group.leave()
}
}
// 2. the new method, just with the timeout parameter
group.notify(queue: someSerialQueue, timeout: 5.0) {
// Run the completion for completing the tasks or for timeout in one place
}
This has the added benefits of:
having no thread issues as long as you pass a serial queue (not true for the accepted answer)
calling the completion only in one place when it's used
Just be sure to prefix the name of the method to some specific namespace of yours if you plan to insert this in a public library, to avoid clashes with other similar implementations of this extension.

How do we implement wait / notify in Swift

In Java, we can do something like this:
synchronized(a) {
while(condition == false) {
a.wait(time);
}
//critical section ...
//do something
}
The above is a conditional synchronized block, that waits for a condition to become successful to execute a critical section.
When a.wait is executed (for say 100 ms), the thread exits critical section for that duration & some other critical section synchronized by object a executes, which makes condition true.
When the condition becomes successful, next time current thread enters the critical section and evaluates condition, loop exits and code executes.
Important points to note:
1. Multiple critical sections synchronized by same object.
2. A thread is not in critical section for only the duration of wait. Once wait comes out, the thread is in critical section again.
Is the below the proper way to do the same in Swift 4 using DispatchSemaphore?
while condition == false {
semaphore1.wait(duration)
}
semaphore1.wait()
//execute critical section
semaphore1.signal()
The condition could get modified by the time we enter critical section.
So, we might have to do something like below to achieve the Java behavior. Is there a simpler way to do this in Swift?
while true {
//lock
if condition == false {
//unlock
//sleep for sometime to prevent frequent polling
continue
} else {
//execute critical section
//...
//unlock
break
}
}

Semaphores
You can solve this problem with a DispatchSemaphore.
Let's look at this code.
Here we have a semaphore, storage property of type String? and a serial queue
let semaphore = DispatchSemaphore(value: 0)
var storage: String? = nil
let serialQueue = DispatchQueue(label: "Serial queue")
Producer
func producer() {
DispatchQueue.global().asyncAfter(deadline: .now() + 3) {
storage = "Hello world!"
semaphore.signal()
}
}
Here we have a function that:
Waits for 3 seconds
Writes "Hello world" into storage
Sends a signal through the semaphore
Consumer
func consumer() {
serialQueue.async {
semaphore.wait()
print(storage)
}
}
Here we have a function that
Waits for a signal from the semaphore
Prints the content of storage
Test
Now I'm going to run the consumer BEFORE the producer function
consumer()
producer()
Result
Optional("Hello world!")
How does it work?
func consumer() {
serialQueue.async {
semaphore.wait()
print(storage)
}
}
The body of the consumer() function is executed asynchronously into the serial queue.
serialQueue.async {
...
}
This is the equivalent of your synchronized(a). Infact, by definition, a serial queue will run one closure at the time.
The first line inside the closure is
semaphore.wait()
So the execution of the closure is stopped, waiting for the green light from the semaphore.
This is happening on a different queue (not the main one) so we are not blocking the main thread.
func producer() {
DispatchQueue.global().asyncAfter(deadline: .now() + 3) {
storage = "Hello world!"
semaphore.signal()
}
}
Now producer() is executed. It waits for 3 seconds on a queue different from the main one and then populates storageand send a signal via the semaphore.
Finally consumer() receives the signal and can run the last line
print(storage)
Playground
If you want to run this code in Playground remember to
import PlaygroundSupport
and to run this line
PlaygroundPage.current.needsIndefiniteExecution = true

Answering my question.
Used an instance of NSLock to lock and unlock in the below pseudo code.
while true {
//lock
if condition == false {
//unlock
//sleep for sometime to prevent frequent polling
continue
} else {
//execute critical section
//...
//unlock
break
}
}

How to stop DispatchGroup or OperationQueue waiting?

DispatchGroup and OperationQueue have methods wait() and waitUntilAllOperationsAreFinished() which wait for all operations in respective queues to complete.
But even when I call cancelAllOperations it just changes the flag isCancelled in every running operation and stop the queue from executing new operations. But it still waits for the operations to complete. Therefore running the operations must be stopped from the inside. But it is possible only if operation is incremental or has an inner cycle of any kind. When it's just long external request (web request for example), there is no use of isCancelled variable.
Is there any way of stopping the OperationQueue or DispatchGroup waiting for the operations to complete if one of the operations decides that all queue is now outdated?
The practical case is: mapping a request to a list of responders, and it is known that only one may answer. If it happens, queue should stop waiting for other operations to finish and unlock the thread.
Edit: DispatchGroup and OperationQueue usage is not obligatory, these are just tools I thought would fit.

OK, so I think I came up with something. Results are stable, I've just tested. The answer is just one semaphore :)
let semaphore = DispatchSemaphore(value: 0)
let group = DispatchGroup()
let queue = DispatchQueue(label: "map-reduce", qos: .userInitiated, attributes: .concurrent)
let stopAtFirst = true // false for all results to be appended into one array
let values: [U] = <some input values>
let mapper: (U) throws -> T? = <closure>
var result: [T?] = []
for value in values {
queue.async(group: group) {
do {
let res = try mapper(value)
// appending must always be thread-safe, otherwise you end up with race condition and unstable results
DispatchQueue.global().sync {
result.append(res)
}
if stopAtFirst && res != nil {
semaphore.signal()
}
} catch let error {
print("Could not map value \"\(value)\" to mapper \(mapper): \(error)")
}
}
}
group.notify(queue: queue) { // this must be declared exactly after submitting all tasks, otherwise notification fires instantly
semaphore.signal()
}
if semaphore.wait(timeout: .init(secondsFromNow: 5)) == .timedOut {
print("MapReduce timed out on values \(values)")
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse