Concurrent queue calls inside serial queue? - swift

In Objective-C and Swift, is there any guarantee of order of execution for concurrent calls being made inside of a serial queue's async block?
Pseudo-code:
let serialQueue = SerialQueue()
let concurrentQueue = ConcurrentQueue()
serialQueue.async { // 1
concurrentQueue.async { // 2
run_task1() // task that takes 1 hour
}
}
serialQueue.async { // 3
concurrentQueue.async { // 4
run_task2() // task that takes 1 minute
}
}
In the above code, is task1 guaranteed to complete before task2 is called?
Or since they're called on a concurrent thread, the serial queue async only guarantees that run_task1 will be added to the concurrentQueue before run_task2, but not guarantee order of execution?

I've numbered the block in your question, so I can reference them here:
Block 1 and 3 are both running on a serial queue, thus block 3 will only run once 1 is done.
However, block 1 and 3 don't actually wait for task1/2, they just queue off work to happen asynchronously in blocks 2 and 4, which finishes near instantly.
From then on, both task 1 and 2 will be running concurrently, and finish in an arbitrary order. The only guarantee is that task1 will start before task2.
I always like to use the analogy of ordering a pizza vs making a pizza. Queuing async work is like ordering a pizza. It doesn't mean you have a pizza ready immediately, and you're not going to be blocked from doing other things while the pizzeria is baking your pizza.
Your blocks 1 and 3 are strongly ordered, so 1 will finish and finish before 3 starts. However, all the block does is order a pizza, and that's fast. It does mean pizza 1 (task 1) is done before pizza 2 (task 2), it just means you got off the first phone call before making the second.

Related

Synchronize work between CPU and GPU within single command buffer using MTLSharedEvent

I am trying to use MTLSharedEvent along with MTLSharedEventListener to synchronize computation between GPU and CPU, as in example provided by Apple (https://developer.apple.com/documentation/metal/synchronization/synchronizing_events_between_a_gpu_and_the_cpu). Basically what I want to achieve is have work split into 3 parts executed in order, like so:
GPU computation part 1
CPU computation based on results from GPU computation part 1
GPU computation part 2 after CPU computation
My problem is that eventListener block is always called before command buffer is being scheduled for execution, which make my CPU task execute first in order.
To simplify the case, let’s use simple commands that fill MTLBuffer with certain values (my original use case is more complicated, as using compute encoders with custom shaders, but behaves the same):
let device = MTLCreateSystemDefaultDevice()!
let queue = device.makeCommandQueue()!
let event = device.makeSharedEvent()!
let dispatchQueue = DispatchQueue(label: "myqueue")
let eventListener = MTLSharedEventListener(dispatchQueue: dispatchQueue)
let metalBuffer = device.makeBuffer(length: 2048, options: MTLResourceOptions.storageModeShared)!
let buffer = queue.makeCommandBuffer()!
NSLog("Start - signaled value: \(event.signaledValue)")
event.notify(eventListener, atValue: 1) { event, value in
// CPU work
let pointer = metalBuffer.contents().assumingMemoryBound(to: UInt8.self)
for i in 0..<512 {
(pointer + i).pointee = (pointer + i).pointee + 1;
}
NSLog("Event notification - signaled value: \(value), buffer status: \(buffer.status.rawValue)")
event.signaledValue = 2
}
// GPU work part 1
let encoder1 = buffer.makeBlitCommandEncoder()!
encoder1.fill(buffer: metalBuffer, range: .init(0...127), value: 22)
encoder1.endEncoding()
// signal with 1 to start CPU task
buffer.encodeSignalEvent(event, value: 1)
// wait for value >= 2 to proceed
buffer.encodeWaitForEvent(event, value: 2)
// GPU work part 2
let encoder2 = buffer.makeBlitCommandEncoder()!
encoder2.fill(buffer: metalBuffer, range: .init(128...511), value: 255)
encoder2.endEncoding()
buffer.addScheduledHandler { buffer in
NSLog("Buffer scheduled - signaled value: \(event.signaledValue)")
}
buffer.addCompletedHandler { buffer in
NSLog("Buffer completed - signaled value: \(event.signaledValue)")
}
buffer.commit()
buffer.waitUntilCompleted()
Output:
2022-01-09 23:46:08.774 Sync[76882:3531755] Metal GPU Frame Capture Enabled
2022-01-09 23:46:08.805 Sync[76882:3531755] Start - signaled value: 0
2022-01-09 23:46:08.808 Sync[76882:3531764] Event notification - signaled value: 1, buffer status: 2 (Commited)
2022-01-09 23:46:08.809 Sync[76882:3531763] Buffer scheduled - signaled value: 2
2022-01-09 23:46:08.809 Sync[76882:3531763] Buffer completed - signaled value: 2
As you can see eventListener logs buffer status as .commited.
What’s the matter here? Am I missing something?
System: macOS 12.0.1, Apple M1 Pro, Xcode 13.2.1
This is perfectly fine that command buffer is committed. In fact if it wouldn't be committed you'll never get to notify block.
GPU and CPU runs in parallel. So when you use MTLEvent you don't stop executing CPU code (all the Swift code actually). You just tell GPU in what order to execute GPU code.
So what's happening in your case:
All your code runs in a single CPU thread without any interruption.
GPU starts executing command buffer commands only when you call commit(). Before it GPU actually don't do anything. You just scheduled command to be performed on GPU but don't perform them.
When GPU executes commands it checks for your MTLEvent. It performs part 1, then encodes value 1 to event, performs notify block, encodes value 2, performs second GPU block.
But again all the actual GPU work starts only when you call commit() on command buffer. That's why buffer is already committed in notify block. Because it is performed after commit().

Delay iteration of an collection in RxSwift

My actual requirement:
I have a list of custom objects, and I want to iterate it with a delay. I can't use DispatchQueue.main.asyncAfter in my for loop since my iterations create a CoreData object that triggers FetchedResultController and hence updates my TableView. Anyway, so I tried using Rx to iterate my list with a delay of 1 second each. But I am unable to do so.
Question
I want to delay the iteration of each element of the array using RxSwift.
I was able to do it in Java, but couldn't do so in RxSwift.
the .delay() operator didn't help either, it just delayed the whole process.
Any example would help, thus I am not posting any specific code... but this is what I've been trying so far
var array = [1, 2, 3, 4, 5]
Observable.from(array)
.delay(RxTimeInterval(5), scheduler: MainScheduler.instance)
.subscribe { (intValue) in
print("onNext() \(intValue)")
}
Output
onNext() next(1)
onNext() next(2)
onNext() next(3)
onNext() next(4)
onNext() next(5)
onNext() completed
The output gets printed after 5 seconds, not with 5 seconds interval.
I am not getting any integer values, but a next(1).
This is a common confusion. As you have learned delay will be applied to every element equally so all of them are delayed by five seconds, rather than putting a five second delay between each event. (So the first event happens immediately, the second at five seconds, the third at ten, and so on.
Once you realize that what you are really trying to do is put a delay between each event, the solution becomes more clear. Or at least, the reason why just putting a delay on the from operator isn't working should be more clear.
The solution is to use one of the flatMap variants:
let array = [1, 2, 3, 4, 5]
Observable.from(array)
.concatMap { Observable.empty().delay(.seconds(5), scheduler: MainScheduler.instance).startWith($0) }
.subscribe { (intValue) in
print("onNext() \(intValue)")
}
The line Observable.empty().delay(.seconds(5), scheduler: MainScheduler.instance).startWith($0) will immediately emit the value, but then wait five seconds before emitting a completed event.
The concatMap operator calls the closure on each incoming event and concats the resulting Observables together so that following Observables aren't subscribed to, until the previous one completes.
Learn more about this in my article, The Many Faces of FlatMap
An alternative solution based on the comments that Sandeep Bhandari wrote on the question would be to use an interval to constrain when the from operator can emit values. Something like this:
let array = [1, 2, 3, 4, 5]
Observable.zip(Observable.from(array), Observable<Int>.interval(.seconds(5), scheduler: MainScheduler.instance).take(while: { $0 != array.count }))
.map { $0 }
.subscribe { (intValue) in
print("onNext() \(intValue)")
}

How to scale random simulations with parallelism in Swift?

I'm playing around in Swift 5.1 (on the Mac) running little simulations of tennis matches. Naturally part of the simulation is randomly choosing who wins each point.
Below is the relevant part of the code where I do the parallelism.
func combine(result: MatchTally)
{
overallTally.add(result: result)
}
DispatchQueue.concurrentPerform(iterations: cycleCount){iterationNumber in
var counter = MatchTally()
for _ in 1...numberOfSimulations
{
let result = playMatch(between: playerOne, and: playerTwo)
counter[result.0, result.1] += 1
}
combiningQueue.sync {combine(result: counter)}
}
With an appropriate simulation run count chosen, a single queue takes about 5s. If I set the concurrent queues to 2, the simulation now takes 3.8s per queue (i.e. it took 7.2s). Doubling again to 4 queues results in 4.8s / queue. And finally with 6 queues (the machine is a 6 core Intel i7) things take 5.6s / queue.
For those who need more convincing that this relates to random number generating (I'm using Double.random(0...1)) I replaced the code where most of the random outcomes are generated with a fixed result (I couldn't replace the second place as I still needed a tie-break) and adjusted the number of simulations appropriately, the outcomes were as follows:
1 queue: 5s / queue
2 queues: 2.7s / queue
4 queues: 1.9s / queue
6 queues: 1.7s / queue
So as you can see, it appears that the randomness part is resistant to running in parallel.
I've also tried with drand48() and encountered the same issues. Anybody know whether this is just the way things are?
Xcode 11.3,
Swift 5.1,
macOS 10.15.3,
Mac mini 2018,
6 core i7 (but have encountered the same thing over the years on different hardware)
For anyone interested in reproducing this themselves, here is some code I created and Alexander added to.
import Foundation
func formatTime(_ date: Date) -> String
{
let df = DateFormatter()
df.dateFormat = "h:mm:ss.SSS"
return df.string(from: date)
}
func something(_ iteration: Int)
{
var tally = 0.0
let startTime = Date()
print("Start #\(iteration) - \(formatTime(startTime))")
for _ in 1...1_000_000
{
tally += Double.random(in: 0...100)
// tally += 3.5
}
let endTime = Date()
print("End #\(iteration) - \(formatTime(endTime)) - elapsed: \(endTime.timeIntervalSince(startTime))")
}
print("Single task performed on main thread")
something(0) // Used to get a baseline for single run
print("\nMultiple tasks performed concurrently")
DispatchQueue.concurrentPerform(iterations: 5, execute: something)
Swapping out the random additive in the loop for a fixed one demonstrates how well the code scales in one scenario, but not the other.
Looks like the solution is to use a less 'fashionable' generator such as drand48(). I believed I had already testing that option, but seems that I was wrong. It seems this doesn't suffer from the same issue so I guess it is inherent to arc4random() which I believe Double.random() is based upon.
The other positive is that it is about 4 times faster to return a value. So my simulations won't be cryptographically secure, but then what tennis match is? 🤭

Why is a monitor implemented in terms of semaphores this way?

I have trouble understanding the implementation of a monitor in terms of semaphores from Operating System Concepts
5.8.3 Implementing a Monitor Using Semaphores
We now consider a possible implementation of the monitor mechanism
using semaphores.
For each monitor, a semaphore mutex (initialized to 1) is provided. A
process must execute wait(mutex) before entering the monitor and must
execute signal(mutex) after leaving the monitor.
Since a signaling process must wait until the resumed process either leaves or waits, an additional semaphore, next, is introduced,
initialized to 0. The signaling processes can use next to suspend
themselves. An integer variable next_count is also provided to count
the number of processes suspended on next. Thus, each external
function F is replaced by
wait(mutex);
...
body of F
...
if (next count > 0)
signal(next);
else
signal(mutex);
Mutual exclusion within a monitor is ensured.
We can now describe how condition variables are implemented as well.
For each condition x, we introduce a semaphore x_sem and an
integer variable x_count, both initialized to 0. The operation x.wait() can now be implemented as
x_count++;
if (next_count > 0)
signal(next);
else
signal(mutex);
wait(x sem);
x_count--;
The operation x.signal() can be implemented as
if (x_count > 0) {
next_count++;
signal(x_sem);
wait(next);
next_count--;
}
What does the reason for introducing semaphore next and the count next_count of processes suspended on next mean?
Why are x.wait() and x.signal() implemented the way they are?
Thanks.
------- Note -------
WAIT() and SIGNAL() denote calls on monitor methods
wait() and signal() denote calls to semaphore methods, in the explanation that follows.
------- End of Note -------
I think it is easier if you think in terms of a concrete example. But before that let's first try to understand what a monitor is. As explained in the book a monitor is a Abstract Data Type meaning that it is not a real type which can be used to instantiate a variable. Rather it is like a specification with some rules and guidelines based on which different languages could provide support for process synchronization.
Semaphors were introduced as a software-based solution for achieving synchronization over hardware-based approaches like TestAndSet() or Swap(). Even with semaphores, the programmers had to ensure that they invoke the wait() & signal() methods in the right order and correctly. So, an abstract specification called monitors were introduced to encapsulate all these things related to synchronization as one primitive so simply any process executing inside the monitor will ensure that these methods (semaphore wait and signal) invocations are used accordingly.
With monitors all shared variables and functions (that use the shared variables) are put into the monitor structure and when any of these functions are invoked the monitor implementation takes care of ensuring that the shared resources are protected over mutual exclusion and any issues of synchronization.
Now with monitors unlike semaphores or other synchronization techniques we are not dealing with just one portion of the critical section but many of them in terms of different functions. In addition, we do also have shared variables that are accessed within these functions. For each of the different functions in a monitor to ensure only one of them is executed and no other process is executing on any of the functions, we can use a global semaphore called mutex.
Consider the example of the solution for the dining philosophers problem using monitors below.
monitor dining_philopher
{
enum {THINKING, HUNGRY, EATING} state[5];
condition self[5];
void pickup(int i) {
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
}
void putdown(int i) {
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
}
void test(int i) {
if (
(state[(i + 4) % 5] != EATING) &&
(state[i] == HUNGRY) &&
(state[(i + 1) % 5] != EATING))
{
state[i] = EATING;
self[i].SIGNAL();
}
}
initialization code() {
for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}
}
Ideally, how a process might invoke these functions would be in the following sequence:
DiningPhilosophers.pickup(i);
...
// do somework
...
DiningPhilosophers.putdown(i);
Now, whilst one process is executing inside the pickup() method another might try to invoke putdown() (or even the pickup) method. In order to ensure mutual exclusion we must ensure only one process is running inside the monitor at any given time. So, to handle these cases we have a global semaphore mutex that encapsulates all the invokable (pickup & putdown) methods. So these two methods will be implemented as follows:
void pickup(int i) {
// wait(mutex);
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
// signal(mutex);
}
void putdown(int i) {
// wait(mutex);
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
// signal(mutex);
}
Now only one process will be able to execute inside the monitor in any of its methods. Now, with this setup, if Process P1 has executed pickup() (but is yet tp putdown the chopsticks) and then Process P2 (say an adjacent diner) tries to pickup(): since his/her chopsticks (shared resource) is in use, it has to wait() for it to be available. Let's look at the WAIT and SIGNAL implementation of the monitor's conditional variables:
WAIT(){
x_count++;
if (next_count > 0)
signal(next);
else
signal(mutex);
wait(x_sem);
x_count--;
}
SIGNAL() {
if (x_count > 0) {
next_count++;
signal(x_sem);
wait(next);
next_count--;
}
}
The WAIT implementation of the conditional variables is different from that of the Semaphore's because it has to provide more functionality, like allowing other processes to invoke functions of the monitor (whilst it waits) by releasing the mutex global semaphore. So, when WAIT is invoked by P2 from the pickup() method, it will call signal(mutex) allowing other processes to invoke the monitor methods and call wait(x_sem) on the semaphore specific to the conditional. Now, P2 is blocked here. In addition, the variable x_count keeps track of the number of Processes waiting on the conditional variable (self).
So when P1 invokes putdown(), this will invoke SIGNAL via the test() method. Inside SIGNAL when P1 invokes signal(x_sem) on the chopstick it holds, it must do one additional thing. It must ensure that only one process is running inside the monitor. If it would only call signal(x_sem) then from that point onwards P1 and P2 both would start doing things inside the monitor. To prevent this P1, after releasing its chopstick it will block itself until P2 finishes. To block itself, it uses the semaphore next. And to notify P2 or some other process that there is someone blocked it uses a counter next_count.
So, now P2 would get the chopsticks and before it exits the pickup() method it must release P1 who is waiting on P2 to finish. So now, we must change the pickup() method (and all functions of the monitor) as follows:
void pickup(int i) {
// wait(mutex);
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
/**************
if (next_count > 0)
signal(next);
else
signal(mutex);
**************/
}
void putdown(int i) {
// wait(mutex);
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
/**************
if (next_count > 0)
signal(next);
else
signal(mutex);
**************/
}
So now, before any process exits a function of the monitor, it checks if there are any waiting processes and if so releases them and not the mutex global semaphore. And the last of such waiting processes will release the mutex semaphore allowing new processes to enter into the monitor functions.
I know it's pretty long, but it took some time for me to understand and wanted to put it in writing. I will post it on a blog soon.
If there are any mistakes please let me know.
Best,
Shabir
I agree its confusing.
Lets first understand the first piece of code:
// if you are the only process on the queue just take the monitor and invoke the function F.
wait(mutex);
...
body of F
...
if (next_count > 0)
// if some process already waiting to take the monitor you signal the "next" semaphore and let it take the monitor.
signal(next);
else
// otherwise you signal the "mutex" semaphore so if some process requested the monitor later.
signal(mutex);
back to your questions:
What does the reason for introducing semaphore next and the count
next_count of processes suspended on next mean?
imagine you have a process that is doing some I/O and it needs to be blocked until it finishes. so you let other processes waiting in the ready queue to take the monitor and invoke the function F.
next_count is only for the purpose to keep track of processes waiting in the queue.
a process suspended on next semaphore is the process who issued wait on condition variable so it will be suspended until some other
process (next process) wake it up and resume work.
Why are x.wait() and x.signal() implemented the way they are?
Lets take the x.wait():
semaphore x_sem; // (initially = 0)
int x_count = 0; // number of process waiting on condition (x)
/*
* This is used to indicate that some process is issuing a wait on the
* condition x, so in case some process has sent a signal x.signal()
* without no process is waiting on condition x the signal will be lost signal (has no effect).
*/
x_count++;
/*
* if there is some process waiting on the ready queue,
* signal(next) will increase the semaphore internal counter so other processes can take the monitor.
*/
if (next_count > 0)
signal(next);
/*
* Otherwise, no process is waiting.
* signal(mutex) will release the mutex.
*/
else
signal(mutex);
/*
* now the process that called x.wait() will be blocked until other process will release (signal) the
* x_sem semaphore: signal(x_sem)
*/
wait(x_sem);
// process is back from blocking.
// we are done, decrease x_count.
x_count--;
Now lets take the x.signal():
// if there are processes waiting on condition x.
if (x_count > 0) {
// increase the next count as new blocked process has entered the queue (the one who called x.wait()). remember (wait(x_sem))
next_count++;
// release x_sem so the process waiting on x condition resume.
signal(x_sem);
// wait until next process is done.
wait(next);
// we are done.
next_count--;
}
Comment if you have any questions.

Incorrect implementation of Barrier Method of synchronisation

Consider the following Barrier method to implement synchronization :
void barrier
{
P(s);
process_arrived++;
V(s);
while(process_arrived != 3);
P(s);
process_left++;
if(process_left == 3)
{
process_Arrived = 0;
process_left = 0;
}
V(s);
}
It is known that this code does not work because of a flaw, but I am not able to find the flaw.
The problem is with the condition : if (process_left == 3)
It may lead to deadlock if two barrier invocations are used in immediate succession.
Initially, process_arrived and process_left will be '0'.
When a process arrives, it increments process_arrived and waits till maximum number of processes have arrived. After that processes are allowed to leave.
Consider the following scenario:
P1 comes and waits till process_arrived becomes 3 ( currently process_arrived = 1)
P2 comes and waits till process_arrived becomes 3 ( currently process_Arrived = 2)
Now P3 comes for execution. The condition in while loop fails and executes further, making process_left = 1 and again enters the function immediately.
It makes process_Arrived = 4 and waits in while loop.
P2 gets a chance to execute makes process_left = 2 and leaves.
P1 executes further and finds that process_left = 3, thereby making both process_arrived and process_left = 0. (Remember that P3 has already entered and is waiting at the barrier, so here a count is lost)
P1 again executes , makes process_arrived = 1 and waits.
P2 also executes again. makes process_arrived = 2 and waits.
Now every process will wait for ever. Hence a deadlock has occurred.