When learning OperationQueue, is is possible to use OperationQueue instead of gcd barrier?
Here is the case:
upload 3 images , then upload other 3 images
With barrier, gcd works perfect
func workFlow(){
let queue = DispatchQueue(label: "test.concurrent.queue", qos: .background, attributes: .concurrent, autoreleaseFrequency: .workItem)
queue.async {
self.uploadImg(idx: "A_0")
}
queue.async {
Thread.sleep(forTimeInterval: 2)
self.uploadImg(idx: "A_1")
}
queue.async {
self.uploadImg(idx: "A_2")
}
queue.async(qos: queue.qos, flags: .barrier) {
print("group A done")
}
print("A: should not be hanged")
queue.async {
self.uploadImg(idx: "B_0")
}
queue.async {
self.uploadImg(idx: "B_1")
}
queue.async {
self.uploadImg(idx: "B_2")
}
queue.async(qos: queue.qos, flags: .barrier) {
print("group B done")
}
print("B: should not be hanged")
}
func uploadImg(idx info: String){
Thread.sleep(forTimeInterval: 1)
print("img \(info) uploaded")
}
While with OperationQueue, there is a little flaw here
The main queue gets hanged, just check the print
"A/B: should not be hanged"
lazy var uploadQueue: OperationQueue = {
var queue = OperationQueue()
queue.name = "upload queue"
queue.maxConcurrentOperationCount = 5
return queue
}()
func workFlow(){
let one = BlockOperation {
self.uploadImg(idx: "A_0")
}
let two = BlockOperation {
Thread.sleep(forTimeInterval: 3)
self.uploadImg(idx: "A_1")
}
let three = BlockOperation {
self.uploadImg(idx: "A_2")
}
uploadQueue.addOperations([one, two, three], waitUntilFinished: true)
print("A: should not be hanged")
uploadQueue.addOperation {
print("group A done")
}
let four = BlockOperation {
self.uploadImg(idx: "B_0")
}
let five = BlockOperation {
self.uploadImg(idx: "B_1")
}
let six = BlockOperation {
self.uploadImg(idx: "B_2")
}
uploadQueue.addOperations([four, five, six], waitUntilFinished: true)
print("B: should not be hanged")
uploadQueue.addOperation {
print("group B done")
}
}
How to do it better with OperationQueue?
If you do not want the operations added to the queue to block the current thread, waitUntilFinished must be false. But if you set it to true, it will block the current thread until the added operations finish.
Obviously, if you do not wait, it will not block the the main thread, but you will also lose the barrier behavior. But iOS 13 and macOS 10.15 introduced addBarrierBlock. If you really need barriers and must support earlier OS versions, then you will have to use dependencies. But if you were previously using GCD barriers simply to constrain the degree of concurrency, then maxConcurrentOperationCount might render the barrier moot. It all depends upon why you were using barriers with these uploads/downloads. (It is a little unusual to see barriers with upload/download queues as it reduces efficiency.)
How to do it better with OperationQueue?
I assume that uploadImg downloads the image synchronously. I would refactor it to be its own Operation subclass, that does the necessary Operation KVO such as shown here. That wraps download task in operation, but you can do the same with upload or data tasks, too (though the memory impact with data tasks is much greater).
But it is always advisable to avoid having synchronous network requests to (a) make sure you do not tie up worker threads; and (b) to make the requests cancelable.
Related
I want to speed up some process so I wrote a swift CLI script that process thousands of files in parallel and write the process result of each file into a single file. (The order of the files does not really matter)
So I wrote below code and it works in the Xcode unit tests (Even with a list of approx 1200 files!) However when I execute the program from the command line without Xcode and with the same list of files it never ends. It looks like it is stuck near the end.
I read that sometimes spanning too many threads will cause the program to stop because it runs out of resources but I thought DispatchQueue.concurrentPerform will take care of that... I have no clue why this works in XCTests and does not work in the terminal.
I have tried DispatchGroup and Semaphore approach and both have the same problem...
Any help is highly appreciated.
let filePaths: [String] = Array with thousands of file paths to process
let group = DispatchGroup()
let concurrentQueue = DispatchQueue(label: "my.concurrent.queue", qos: .userInitiated, attributes: .concurrent)
let serialQueue = DispatchQueue(label: "my.serial.queue", qos: .userInitiated)
group.enter()
concurrentQueue.async {
DispatchQueue.concurrentPerform(iterations: filePaths.count) { (fileIndex) in
let filePath = filePaths[fileIndex]
let result = self.processFile(path: filePath)
group.enter()
serialQueue.async {
self.writeResult(result)
group.leave()
}
}
group.leave()
}
group.wait()
First, a few simplifications:
You have code that is
group.enter()
serialQueue.async {
self.writeResult(result)
group.leave()
}
That can be simplified to:
serialQueue.async(group: group) {
self.writeResult(result)
}
Consider:
group.enter()
concurrentQueue.async {
DispatchQueue.concurrentPerform(iterations: filePaths.count) { (fileIndex) in
...
}
group.leave()
}
That concurrentQueue is redundant. This can be simplified to:
DispatchQueue.concurrentPerform(iterations: filePaths.count) { (fileIndex) in
...
}
That reduces your code to:
let group = DispatchGroup()
let writeQueue = DispatchQueue(label: "serial.write.queue", qos: .userInitiated)
DispatchQueue.concurrentPerform(iterations: filePaths.count) { [self] index in
let result = processFile(path: filePaths[index])
writeQueue.async(group: group) {
writeResult(result)
}
}
group.wait()
That begs the question as to why you are dispatching asynchronously to a serial queue for the write operations. That can introduce problems (e.g. if it gets backlogged, you will holding all unwritten result values in memory at the same time).
One option is to write synchronously (you have to wait for the write operations in the end, anyway):
let writeQueue = DispatchQueue(label: "serial.write.queue", qos: .userInitiated)
DispatchQueue.concurrentPerform(iterations: filePaths.count) { [self] index in
let result = processFile(path: filePaths[index])
writeQueue.sync {
writeResult(result)
}
}
Or you can probably just write from the various concurrent threads, themselves:
let writeQueue = DispatchQueue(label: "serial.write.queue", qos: .userInitiated)
DispatchQueue.concurrentPerform(iterations: filePaths.count) { [self] index in
let result = processFile(path: filePaths[index])
writeResult(result)
}
import Dispatch
class SynchronizedArray<T> {
private var array: [T] = []
private let accessQueue = DispatchQueue(label: "SynchronizedArrayAccess", attributes: .concurrent)
var get: [T] {
accessQueue.sync {
array
}
}
func append(newElement: T) {
accessQueue.async(flags: .barrier) {
self.array.append(newElement)
}
}
}
If I run the following code, 10,000 elements are appended to the array as expected even if I am reading concurrently:
DispatchQueue.concurrentPerform(iterations: 10000) { i in
_ threadSafeArray.get
threadSafeArray.append(newElement: i)
}
But when I do this, only it never comes close to adding 10,000 elements (only added 92 elements on my computer the last time I ran it).
let concurrent = DispatchQueue(label: "com.concurrent", attributes: .concurrent)
for i in 0..<10000 {
concurrent.async {
_ = threadSafeArray.get
threadSafeArray.append(newElement: i)
}
}
Why does the former work, and why doesn't the latter work?
It's good that you found a solution to the thread explosion. See a discussion on thread explosion WWDC 2015 Building Responsive and Efficient Apps with GCD and again in WWDC 2016 Concurrent Programming With GCD in Swift 3.
That having been said, DispatchSemaphore is a bit of an anti-pattern, nowadays, given the presence of concurrentPerform (or OperationQueue with its maxConcurrentOperationCount or Combine with its maxPublishers). All of these manage degrees of concurrency more elegantly than dispatch semaphores.
All that having been said, a few observations on your semaphore pattern:
When using this DispatchSemaphore pattern, you generally put the wait before the concurrent.async { ... } (because, as written, you're getting nine concurrent operations, not eight, which is a bit misleading).
The deeper problem here is that you've diminished the problem of the count issue, but it still persists. Consider:
let threadSafeArray = SynchronizedArray<Int>()
let concurrent = DispatchQueue(label: "com.concurrent", attributes: .concurrent)
let semaphore = DispatchSemaphore(value: 8)
for i in 0..<10000 {
semaphore.wait()
concurrent.async {
threadSafeArray.append(newElement: i)
semaphore.signal()
}
}
print(threadSafeArray.get.count)
When you leave the for loop, you can still have up to eight of the async tasks on concurrent still running, and the count (unsynchronized with respect to concurrent queue) can still be less than 10,000. You have to add another concurrent.async(flags: .barrier) { ... }, which is just adding a second layer of synchronization. E.g.
let semaphore = DispatchSemaphore(value: 8)
for i in 0..<10000 {
semaphore.wait()
concurrent.async {
threadSafeArray.append(newElement: i)
semaphore.signal()
}
}
concurrent.async(flags: .barrier) {
print(threadSafeArray.get.count)
}
Or you can use a DispatchGroup, the classical mechanism for determining when a series of asynchronously dispatched blocks finish:
let semaphore = DispatchSemaphore(value: 8)
let group = DispatchGroup()
for i in 0..<10000 {
semaphore.wait()
concurrent.async(group: group) {
threadSafeArray.append(newElement: i)
semaphore.signal()
}
}
group.notify(queue: .main) {
print(threadSafeArray.get.count)
}
Using of concurrentPerform eliminates the need for either of these patterns because it wonβt continue execution until all of the concurrent tasks are done. (It will also automatically optimize the degree of concurrency for the number of cores on your device.)
FWIW, a much better alternative to to SynchronizedArray is to not expose the underlying array at all, and just implement whatever methods you want to exposed, integrating the necessary synchronization. It makes for cleaner call site, and solves many issues.
For example, assuming you wanted to expose subscript operator and a count variable, you would do:
class SynchronizedArray<T> {
private var array: [T]
private let accessQueue = DispatchQueue(label: "com.domain.app.reader-writer", attributes: .concurrent)
init(_ array: [T] = []) {
self.array = array
}
subscript(index: Int) -> T {
get { reader { $0[index] } }
set { writer { $0[index] = newValue } }
}
var count: Int {
reader { $0.count }
}
func append(newElement: T) {
writer { $0.append(newElement) }
}
func reader<U>(_ block: ([T]) throws -> U) rethrows -> U {
try accessQueue.sync { try block(array) }
}
func writer(_ block: #escaping (inout [T]) -> Void) {
accessQueue.async(flags: .barrier) { block(&self.array) }
}
}
This solves a variety of issues. For example, you can now do:
print(threadSafeArray.count) // get the count
print(threadSafeArray[500]) // get the 500th item
You also now can also do things like:
let average = threadSafeArray.reader { array -> Double in
let sum = array.reduce(0, +)
return Double(sum) / Double(array.count)
}
But, bottom line, when dealing with collections (or any mutable object), you invariably do not want to expose the mutable object, itself, but rather write your own synchronized methods for common operations (subscripts, count, removeAll, etc.), and possibly also expose the reader/writer interface for those cases where the app developer might need a broader synchronization mechanism.
(FWIW, the changes to this SynchronizedArray apply both to the semaphore or concurrentPerform scenarios; it is just that the semaphore just happens to manifest the problem in this case.)
Needless to say, you would generally have more work being done on each thread, too, because as modest as the context switching overhead, it is likely enough here to offset any advantages gained from parallel processing. (But I understand that this was likely just a conceptual demonstration of a problem, not a proposed implementation.) Just a FYI to future readers.
Seems I was experiencing Thread Explosion as 82 threads were being created and the app ran out of threads, the solution I used is a semaphore to limit the number of threads:
let semaphore = DispatchSemaphore(value: 8)
let concurrent = DispatchQueue(label: "com.concurrent", attributes: .concurrent)
for i in 0..<10000 {
concurrent.async {
_ = threadSafeArray.get
threadSafeArray.append(newElement: i)
semaphore.signal()
}
semaphore.wait()
}
Edit: Rob's answer explains some issues with above code
In the following code, is it safe to append to an array? Is the order guaranteed to be maintained?
let processedData: [SomeType] = []
let dispatchGroup = DispatchGroup()
for _ in 0..<N {
dispatchGroup.enter()
startSomeAsyncTaskXYZ { (data, error) in
// handle error and process data
// add processed data to an array
processedData.append(..)
dispatchGroup.leave()
}
}
dispatchGroup.notify(queue: .main) {
// update UI
}
To stick with DispatchGroup while preserving the desired asynchronous nature and the expected ordering, make your array an array of optionals and populate it in whatever order the tasks complete:
var processedData: [SomeType?] = Array(repeating: nil, count: N)
let dispatchGroup = DispatchGroup()
for idx in 0..<N {
dispatchGroup.enter()
startSomeAsyncTaskXYZ { (data, error) in
// Ensure we always .leave() after we're done
// handling the completion of the task
defer { dispatchGroup.leave() }
guard let data = data,
error == nil else {
// TODO: Actual error handling
return
}
// This needs to be .sync now (not .async) to ensure
// the deferred dispatchGroup.leave() is not called
// until *after* we've updated the array
DispatchQueue.main.sync {
processedData[idx] = SomeType(data: data)
}
}
}
dispatchGroup.notify(queue: .main) {
// update UI
}
DispatchGroup has no relevance to execution order. It's merely a way of tracking completion of groups of tasks.
Whether the group's constituent tasks run async or sync, and in what order, is entirely dependant on how you use DispatchQueues.
Just went through this with my app. I have some async tasks that need to be done in order. The best way to accomplish that is through dispatchGroup() and semaphore().
The basic idea is that dispatchGroup fetches data in no particular order, but semaphore goes in a specific order if you need it to.
Here is a good video that demonstrates it: https://www.youtube.com/watch?v=6rJN_ECd1XM&ab_channel=LetsBuildThatApp
Some sample code would look like this:
let dispatchQueue = DispatchQueue.global(qos: .userInitiated)
let dispatchGroup = DispatchGroup()
let semaphore = DispatchSemaphore(value: 0)
override func viewDidLoad() {
dispatchQueue.async {
// --------------
// Family members
// --------------
// Get members and household information first (esp. payday time and time zone), then everything else
self.dispatchGroup.enter()
MPUser.loadFamilyMembers {
print("We got us some fambly members!")
self.dispatchGroup.leave()
self.semaphore.signal()
}
// ^^^ Wait for above code to finish ('signal') before moving on (in other words, get users first)
self.semaphore.wait()
// --------------
// Household info
// --------------
self.dispatchGroup.enter()
FamilyData.observeHouseholdInformation {
self.dispatchGroup.leave()
self.semaphore.signal()
}
// ^^^ Wait for above code to finish ('signal') before moving on (in other words, get users first, then household info)
self.semaphore.wait()
// ---------------
// Everything else
// ---------------
self.dispatchGroup.enter()
Setup.observeProgress {
self.dispatchGroup.leave()
}
self.dispatchGroup.enter()
OutsideIncome.observeUserOutsideIncomeBudget {
self.dispatchGroup.leave()
}
self.dispatchGroup.enter()
MPUser.observeCurrentEarnings {
self.dispatchGroup.leave()
}
self.dispatchGroup.notify(queue: .main) {
let end = Date()
print("\nAll functions completed in \(end.timeIntervalSince(self.start!).rounded(toPlaces: 2)) seconds!\n")
self.sendUserToCorrectPage()
}
}
}
What i am doing wrong? At playground it runs as it should. But as soon as i deploy it on iOS simulator it returns the wrong sequence.
#objc func buttonTapped(){
let group = DispatchGroup()
let dispatchQueue = DispatchQueue.global(qos: .default)
for i in 1...4 {
group.enter()
dispatchQueue.async {
print("πΉ \(i)")
}
group.leave()
}
for i in 1...4 {
group.enter()
dispatchQueue.async {
print("β \(i)")
}
group.leave()
}
group.notify(queue: DispatchQueue.main) {
print("jobs done by group")
}
}
Console Output:
I don't get it. π
You should put the group.leave() statement in the dispatchQueue.async block as well, otherwise it will be executed synchronously before the async block would finish execution.
#objc func buttonTapped(){
let group = DispatchGroup()
let dispatchQueue = DispatchQueue.global(qos: .default)
for i in 1...4 {
group.enter()
dispatchQueue.async {
print("πΉ \(i)")
group.leave()
}
}
for i in 1...4 {
group.enter()
dispatchQueue.async {
print("β \(i)")
group.leave()
}
}
group.notify(queue: DispatchQueue.main) {
print("jobs done by group")
}
}
As DΓ‘vid said, properly employed dispatch groups only ensure that the notification takes place after all of the tasks finish, which you can achieve by calling leave from within the dispatched blocks, as he showed you. Or alternatively, since your dispatched tasks are, themselves, synchronous, you don't have to manually enter and leave the group, but can use group parameter of async method:
let group = DispatchGroup()
let queue = DispatchQueue.global(qos: .default)
for i in 1...4 {
queue.async(group: group) {
print("πΉ \(i)")
}
}
for i in 1...4 {
queue.async(group: group) {
print("β \(i)")
}
}
group.notify(queue: .main) {
print("jobs done by group")
}
Use group.enter() and group.leave() when calling some asynchronous method, but in the case of these print statements, you can just use async(group:execute:) as shown above.
Now, we've solved the problem where the "jobs done by group" block didn't wait for all of the dispatched tasks. But, because you're doing all of this dispatching to a concurrent queue (all the global queues are concurrent queues), you have no assurances that your tasks will be performed in the order that you requested. They're queued up in a strict FIFO manner, but because they're concurrent, you have no assurances when you'll hit the respective print statements.
If you need it to print the messages in order, you will have to use a serial queue. For example, if you create your own queue, in the absence of the .concurrent attribute, the following will create a serial queue:
// create serial queue
let queue = DispatchQueue(label: "...")
// but not your own concurrent queue:
//
// let queue = DispatchQueue(label: "...", attributes: .concurrent)
//
// nor one of the global concurrent queues:
//
// let queue = DispatchQueue.global(qos: .default)
//
And if you run the above code with this serial queue, you'll see what you were looking for:
πΉ 1
πΉ 2
πΉ 3
πΉ 4
β 1
β 2
β 3
β 4
jobs done by group
But, then, again, if you were using a serial queue, the group would be completely unnecessary (you could just add the "completion" task as yet another dispatched task at the end of the serial queue). I only show the use of serial queues as a way to avoid the race condition of dispatching eight tasks to a concurrent queue.
I have code that goes like this:
myArray.forEach { item in
concurentOperation(item)
}
Every item in the array goes through a concurrent operation function, which runs in different threads, I'm not sure exactly which thread or how many threads because the function is from a third party library and out of my control. I need a way to find out once all operations are finished.
How can I do this?
without modifying concurentOperation() this is NOT available, sorry ...
UPDATE for user #Scriptable
next snippet demonstrates, why his solution doesn't work ...
import PlaygroundSupport
import Dispatch
PlaygroundPage.current.needsIndefiniteExecution = true
let pq = DispatchQueue(label: "print", qos: .background)
func dprint(_ items: Any...) {
pq.async {
let r = items.map{ String(describing: $0) }.joined(separator: " ")
print(r)
}
}
func concurrentOperation<T>(item: T) { // dummy items
DispatchQueue.global().async {
// long time operation
for i in 0..<10000 {
_ = sin(Double(i))
}
dprint(item, "done")
}
}
let myArray = [1,2,3,4,5,6,7,8,9,0]
let g = DispatchGroup()
myArray.forEach { (item) in
DispatchQueue.global().async(group: g) {
concurrentOperation(item: item)
}
}
g.notify(queue: DispatchQueue.main) {
dprint("all jobs done???")
}
UPDATE 2
without modifying the code of ConcurrentOperation()
DispatchQueue.global().async(group: g) {
concurrentOperation(item: item)
}
the dispatch group is entered and immediately left because concurrentOperation is the asynchronous function. If it is synchronous then the question has no sense.