Parallelism within concurrentPerform closure

Parallelism within concurrentPerform closure - swift

I am looking to implement concurrency inside part of my app in order to speed up processing. The input array can be a large array, that I need to check multiple things related to it. This would be some sample code.
EDITED:
So this is helpful for looking at striding through the array, which was something else I was looking at doing, but I think the helpful answers are sliding away from the original question, due to the fact that I already have a DispatchQueue.concurrentPerform present in the code.
Within a for loop multiple times, I was looking to implement other for loops, due to having to relook at the same data multiple times. The inputArray is an array of structs, so in the outer loop, I am looking at one value in the struct, and then in the inner loops I am looking at a different value in the struct. In the change below I made the two inner for loops function calls to make the code a bit more clear. But in general, I would be looking to make the two funcA and funcB calls, and wait until they are both done before continuing in the main loop.
//assume the startValues and stop values will be within the bounds of the
//array and wont under/overflow
private func funcA(inputArray: [Int], startValue: Int, endValue: Int) -> Bool{
for index in startValue...endValue {
let dataValue = inputArray[index]
if dataValue == 1_000_000 {
return true
}
}
return false
}
private func funcB(inputArray: [Int], startValue: Int, endValue: Int) -> Bool{
for index in startValue...endValue {
let dataValue = inputArray[index]
if dataValue == 10 {
return true
}
}
return false
}
private func testFunc(inputArray: [Int]) {
let dataIterationArray = Array(Set(inputArray))
let syncQueue = DispatchQueue(label: "syncQueue")
DispatchQueue.concurrentPerform(iterations: dataIterationArray.count) { index in
//I want to do these two function calls starting roughly one after another,
//to work them in parallel, but i want to wait until both are complete before
//moving on. funcA is going to take much longer than funcB in this case,
//just because there are more values to check.
let funcAResult = funcA(inputArray: dataIterationArray, startValue: 10, endValue: 2_000_000)
let funcBResult = funcB(inputArray: dataIterationArray, startValue: 5, endValue: 9)
//Wait for both above to finish before continuing
if funcAResult && funcBResult {
print("Yup we are good!")
} else {
print("Nope")
}
//And then wait here until all of the loops are done before processing
}
}

In your revised question, you contemplated a concurrentPerform loop where each iteration called funcA and then funcB and suggested that you wanted them “to work them in parallel”.
Unfortunately. that is not how concurrentPerform works. It runs the separate iterations in parallel, but the code within the closure should be synchronous and run sequentially. If the closure introduces additional parallelism, that will adversely affect how the concurrentPerform can reason about how many worker threads it should use.
Before we consider some alternatives, let us see what will happen if funcA and funcB remain synchronous. In short, you will still enjoy parallel execution benefits.
Below, I logged this with “Points of Interest” intervals in Instruments, and you will see that funcA (in green) never runs concurrently with funcB (in purple) for the same iteration (i.e., for the same range of start and end indices). In this example, I am processing an array with 180 items, striding 10 items at a time, ending up with 18 iterations running on an iPhone 12 Pro Max with six cores:
But, as you can see, although funcB for a given range of indices will not start until funcA finishes for the same range of indices, it does not really matter, because we are still enjoying full parallelism on the device, taking advantage of all the CPU cores.
I contend that, given that we are enjoying parallelism, that there is little benefit to contemplate making funcA and funcB run concurrently with respect to each other, too. Just let the individual iterations run parallel to each other, but let A and B run sequentially, and call it a day.
If you really want to have funcA and funcB run parallel with each other, as well, you will need to consider a different pattern. The concurrentPerform simply is not designed for launching parallel tasks that, themselves, are asynchronous. You could consider:
Have concurrentPerform launch, using my example, 36 iterations, half of which do funcA and half of which do funcB.
Or you might consider using OperationQueue with a reasonable maxConcurrentOperationCount (but you do not enjoy the dynamic limitation of the degree concurrency to the device’s CPU cores).
Or you might use an async-await task group, which will limit itself to the cooperative thread pool.
But you will not want to have concurrentPerform have a closure that launches asynchronous tasks or introduces additional parallel execution.
And, as I discuss below, the example provided in the question is not a good candidate for parallel execution. Mere tests of equality are not computationally intensive enough to enjoy parallelism benefits. It will undoubtedly just be slower than the serial pattern.
My original answer, below, outlines the basic concurrentPerform considerations.
The basic idea is to “stride” through the values. So calculate how many “iterations” are needed and calculate the “start” and “end” index for each iteration:
private func testFunc(inputArray: [Int]) {
DispatchQueue.global().async {
let array = Array(Set(inputArray))
let syncQueue = DispatchQueue(label: "syncQueue")
// calculate how many iterations will be needed
let count = array.count
let stride = 10
let (quotient, remainder) = count.quotientAndRemainder(dividingBy: stride)
let iterations = remainder == 0 ? quotient : quotient + 1
// now iterate
DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
// calculate the `start` and `end` indices
let start = stride * iteration
let end = min(start + stride, count)
// now loop through that range
for index in start ..< end {
let value = array[index]
print("iteration =", iteration, "index =", index, "value =", value)
}
}
// you won't get here until they're all done; obviously, if you
// want to now update your UI or model, you may want to dispatch
// back to the main queue, e.g.,
//
// DispatchQueue.main.async {
// ...
// }
}
}
Note, if something is so slow that it merits concurrentPerform, you probably want to dispatch the whole thing to a background queue, too. Hence the DispatchQueue.global().async {…} shown above. You would probably want to add a completion handler to this method, now that it runs asynchronously, but I will leave that to the reader.
Needless to say, there are quite a few additional considerations:
The stride should be large enough to ensure there is enough work on each iteration to offset the modest overhead introduced by multithreading. Some experimentation is often required to empirically determine the best striding value.
The work done in each thread must be significant (again, to justify the multithreading overhead). I.e., simply printing values is obviously not enough. (Worse, print statements compound the problem by introducing a hidden synchronization.) Even building a new array with some simple calculation will not be sufficient. This pattern really only works if you are doing something very computationally intensive.
You have a “sync” queue, which suggests that you understand that you need to synchronize the combination of the results of the various iterations. That is good. I will point out, though, that you will want to minimize the total number of synchronizations you do. E.g. let’s say you have 1000 values and you end up doing 10 iterations, each striding through 100 values. You generally want to have each iteration build a local result and do a single synchronization for each iteration. Using my example, you should strive to end up with only 10 total synchronizations, not 1000 of them, otherwise excessive synchronization can easily negate any performance gains.
Bottom line, making a routine execute in parallel is complicated and you can easily find that the process is actually slower than the serial rendition. Some processes simply don’t lend themselves to parallel execution. We obviously cannot comment further without understanding what your processes entail. Sometimes other technologies, such as Accelerate or Metal can achieve better results.

I will explain it here, since comment is too small, but will delete later if it doesn't answer the question.
Instead of looping over iterations: dataIterationArray.count, have number of iterations based on number of desired parallel streams of work, not based on array size. For example as you mentioned you want to have 3 streams of work, then you should have 3 iterations, each iteration processing independent part of work:
DispatchQueue.concurrentPerform(iterations: 3) { iteration in
switch iteration {
case 0:
for i in 1...10{
print ("i \(i)")
}
case 1:
for j in 11...20{
print ("j \(j)")
}
case 2:
for k in 21...30{
print ("k \(k)")
}
}
}
And the "And then wait here until all of the loops are done before processing" will happen automatically, this is what concurrentPerform guarantees.

Related

Looped delay question in Swift, what's the difference?

So basically I've been trying to mess around with loops and delays in Swift. I've found multiple answers about how to implement it properly and I've also found answers how to do so. But I have one unanswered question.
Why does this delayed loop work:
for a in 1..<61 {
DispatchQueue.main.asyncAfter(deadline: .now() + Double(a)) {
print(a)
}
}
while this doesnt have any delay besides the very first one:
for a in 1..<61 {
DispatchQueue.main.asyncAfter(deadline: .now() + 1) {
print(a)
}
}

Loops do not wait for the DispatchQueue. Using a DispatchQueue is like saying:
"I want this work to be moved over to another thread (in this case the main thread, so doesn't change) by some deadline (how long until the work will happen)".
Since the loop does not wait, in the second example, everything is executed after 1 second.
However, in the first case, the delay is offset by different amounts. First iteration is in 2 seconds, then 3, then 4, etc.
Note: Delays inside loops are not recommended. There are usually other solutions, such as using timers.

Bad practice for a method to call itself?

I've a question about good or bad practice.
I've created a function which will generate a random number. And if the random number is equal to the previous random number it should generate a new number. So my question is. Is it bad practice to call the same method from the method?
func getRandomNumber(){ //<-- Method name
let randomNumber = Int.random(in: 0..<allPlayers.count)
if lastRoundNumber == randomNumber{
getRandomNumber() //<-- Like this
}
print(randomNumber)
}
Or should I do this is another way? If yes, how?
So is it bad practice to call the same method from the current method like I've done in the code above? Thanks in advance.
If yes, why is it bad? And how can you do this to get a "better" code?

There is nothing wrong with having a function call itself. It’s called recursion. When not implemented properly, it can introduce some overhead, but sometimes it can be a very elegant solution.
That having been said, you might not want to do it like you have here. What if it guessed the same number three times before it got one that wasn’t equal to lastRoundNumber? You’d see four print statements for one new value. Do you really want that behavior? If you were going to implement getRandomNumber as a recursive function, at the very least I’d suggest inserting a return statement after it calls itself recursively, so that you don’t get print statements for the iterations where it ended up with the same value as lastRoundNumber.
That having been said, we often only reach for recursion (and the overhead that entails) when that implementation is appreciably more elegant or intuitive than the non-recursive rendition. But in this case, the non-recursive rendition is probably just as clear, and as such, we’d likely favor it over the recursive version. It might look like:
func getRandomNumber() {
guard allPlayers.count > 1 else { return }
var randomNumber: Int
repeat {
randomNumber = .random(in: 0..<allPlayers.count)
} while randomNumber == lastRoundNumber
print(randomNumber)
}
Note, I’m checking that you have more than one player to avoid the possibility of infinite loop.
But let's say there were 100 players. And let’s say you called this 100 times. Is it OK if it returned player 1, then player 2, then player 1 again, then player 2 again, repeating again and again, never returning any players 3 through 100. This is unlikely, but it’s possible. Is that OK?
Often we want to return all players, but in a random order. In that case, you’d “shuffle” the list, e.g.
let players = (0..<allPlayers.count).shuffled()
That will ensure that you have an array of integer values, shuffled into random order, but never repeating any given number. That provides randomness while also ensuring that each value is returned only once.
It just depends upon your desired behavior.

If you call a method from the same method, it is called recursion.
You can find here an explanation how recursion in swift works.
You should make sure that your method has an exit condition, so you are not stuck in your call.
Let's look at an example call of your method. lastRoundNumber is 1.
Your generated number is also 1. So it will call the method again. Then you will generate the number 2.
With print(randomNumber) you will get the following output:
2
1
It will happen, because the print-statement will be excuted, even if you call the method again.
So you need to rework your if statement to the following:
if lastRoundNumber == randomNumber{
getRandomNumber() //<-- Like this
} else {
print(randomNumber)
}
In this way, it will only print the last generated value

Tail recursion when loading lot of items

I need to load a lot of small files from an api that allows me to load only one file at a time. As they are very small I start several downloads at a time. Depending on the result I start the next batch load.
For each request I use a observable and then combine several with combineLatest. After combineLatest I do a flatMap and concat a new call to the same function.
As abstraction I do this - pseudo code, not compiling:
func loadRecursively(items) -> Observable<XY> {
combineLatest(requestObservables)
.flatMap {
return loadRecursively(items-loadedItems)
}
}
This works perfectly in general.
The problem: This leads to a growing recursive tail, which is not cut off by compiler optimisation as it seems. So when loading some thousand files the stack will grow and finally the app will close.
How would I avoid the growing tail? Or in general how would I approach this problem with rx?

RxSwift has concatMap operator (because people had been faced with same problem), that allows you to sequentially loop through your Observables.
Simple example:
Observable.from([1, 2, 3, 4])
.concatMap(Observable.just)
.subscribe(onNext: {
print($0)
})
.disposed(by: bag)
Prints:
1
2
3
4

NSOperationQueue worse performance than single thread on computation task

My first question!
I am doing CPU-intensive image processing on a video feed, and I wanted to use OperationQueue. However, the results are absolutely horrible. Here's an example—let's say I have a CPU intensive operation:
var data = [Int].init(repeating: 0, count: 1_000_000)
func run() {
let startTime = DispatchTime.now().uptimeNanoseconds
for i in data.indices { data[i] = data[i] &+ 1 }
NSLog("\(DispatchTime.now().uptimeNanoseconds - startTime)")
}
It takes about 40ms on my laptop to execute. I time a hundred runs:
(1...100).forEach { i in run(i) }
They average about 42ms each, for about 4200ms total. I have 4 physical cores, so I try to run it on an OperationQueue:
var q = OperationQueue()
(1...100).forEach { i in
q.addOperation {
run(i)
}
}
q.waitUntilAllOperationsAreFinished()
Interesting things happen depending on q.maxConcurrentOperationCount:
concurrency single operation total
1 45ms 4500ms
2 100-250ms 8000ms
3 100-300ms 7200ms
4 250-450ms 9000ms
5 250-650ms 9800ms
6 600-800ms 11300ms
I use the default QoS of .background and can see that the thread priority is default (0.5). Looking at the CPU utilization with Instruments, I see a lot of wasted cycles (the first part is running it on main thread, the second is running with OperationQueue):
I wrote a simple thread queue in C and used that from Swift and it scales linearly with the cores, so I'm able to get my 4x speed increase. But what am I doing wrong with Swift?
Update: I think we have concluded that this is a legitimate bug in DispatchQueue. Then the question actually is what is the correct channel to ask about issues in DispatchQueue code?

You seem to measure the wall-clock time of each run execution. This does not seem to be the right metric. Parallelizing the problem does not signify that each run will execute faster... it just means that you can do several runs at once.
Anyhow, let me verify your results.
Your function run seems to take a parameter some of the time only. Let me define a similar function for clarity:
func increment(_ offset : Int) {
for i in data.indices { data[i] = data[i] &+ offset }
}
On my test machine, in release mode, this code takes 0.68 ns per entry or about 2.3 cycles (at 3.4 GHz) per addition. Disabling bound checking helps a bit (down to 0.5 ns per entry).
Anyhow. So next let us parallelize the problem as you seem to suggest:
var q = OperationQueue()
for i in 1...queues {
q.addOperation {
increment(i)
}
}
q.waitUntilAllOperationsAreFinished()
That does not seem particular safe but is it fast?
Well, it is faster... I hit 0.3 ns per entry.
Source code : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/opqueue

.background Will run the threads with the lowest priority. If you are looking for fast execution, consider .userInitiated and make sure you are measuring the performance with compiler optimizations turned on.
Also consider using DispatchQueue instead of OperationQueue. It might have less overhead and better performance.
Update based on your comments: try this. It goes from 38s on my laptop to 14 or so.
Notable changes:
I made the queue explicitly concurrent
I run the thing in release mode
Replaced the inner loop calculation with random number, the original got optimized out
QoS set to higher level: QoS now works as expected and .background runs forever
var data = [Int].init(repeating: 0, count: 1_000_000)
func run() {
let startTime = DispatchTime.now().uptimeNanoseconds
for i in data.indices { data[i] = Int(arc4random_uniform(1000)) }
print("\((DispatchTime.now().uptimeNanoseconds - startTime)/1_000_000)")
}
let startTime = DispatchTime.now().uptimeNanoseconds
var g = DispatchGroup()
var q = DispatchQueue(label: "myQueue", qos: .userInitiated, attributes: [.concurrent])
(1...100).forEach { i in
q.async(group: g) {
run()
}
}
g.wait()
print("\((DispatchTime.now().uptimeNanoseconds - startTime)/1_000_000)")
Something is still wrong though - serial queue runs 3x faster even though it does not use all cores.

For the sake of future readers, two observations on multithreaded performance:
There is a modest overhead introduced by multithreading. You need to make sure that there is enough work on each thread to offset this overhead. As the old Concurrency Programming Guide says
You should make sure that your task code does a reasonable amount of work through each iteration. As with any block or function you dispatch to a queue, there is overhead to scheduling that code for execution. If each iteration of your loop performs only a small amount of work, the overhead of scheduling the code may outweigh the performance benefits you might achieve from dispatching it to a queue. If you find this is true during your testing, you can use striding to increase the amount of work performed during each loop iteration. With striding, you group together multiple iterations of your original loop into a single block and reduce the iteration count proportionately. For example, if you perform 100 iterations initially but decide to use a stride of 4, you now perform 4 loop iterations from each block and your iteration count is 25.
And goes on to say:
Although dispatch queues have very low overhead, there are still costs to scheduling each loop iteration on a thread. Therefore, you should make sure your loop code does enough work to warrant the costs. Exactly how much work you need to do is something you have to measure using the performance tools.
A simple way to increase the amount of work in each loop iteration is to use striding. With striding, you rewrite your block code to perform more than one iteration of the original loop.
You should be wary of using either operations or GCD dispatches to achieve multithreaded algorithms. This can lead to “thread explosion”. You should use DispatchQueue.concurrentPerform (previously known as dispatch_apply). This is a mechanism for performing loops in parallel, while ensuring that the degree of concurrency will not exceed the capabilities of the device.

How do for-in loops in swift work?

I've used them, I'm familiar with the code and read a number tutorials but I still don't understand exactly how they work in that I can't run through my head what I'm doing and ultimately what I want to achieve, as opposed to say an if statement which can be read in English quite well.
For-loops have always been something I've struggled with through lack of understanding, can someone offer some insight please?

The for-in loop performs a set of statements for each item in a range or collection. Swift also provides two range operators a..<b and a...b, as a shortcut for expressing a range of values.
// prints 1-10
for i in 1...10 {
print(i)
}
// This way has been removed in Swift 3 so use the above
for var i = 1; i <= 10; i+=1 {
print(i)
}

The for-in loop is used for iterations on numbers, items in an array or characters in a string.
//I want my var res to be equal to 10:
for var nb = 0 in 0..10 {
nb += 1
print("my nb is \(nb)"
}

To understand for loops, one needs to understand the need for repeating lines of code (an unrolled loop) for a bunch of sequential numbers or array elements, etc. A "for-loop" tells the processor to do most of the repetition for you, without the need for you to copy-paste all those nearly identical lines of code a whole bunch (maybe millions or billions) of times.
A "for in" loop lets you specify the range (of numbers or array elements, etc.) over which you want the repetition, so the code doesn't go on repeating forever.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse