Modification to "Implementing an N process barrier using semaphores" - operating-system

Recently I see this problem which is pretty similar to First reader/writer problem.
Implementing an N process barrier using semaphores
I am trying to modify it to made sure that it can be reuse and work correctly.
n = the number of threads
count = 0
mutex = Semaphore(1)
barrier = Semaphore(0)
mutex.wait()
count = count + 1
if (count == n){ barrier.signal()}
mutex.signal()
barrier.wait()
mutex.wait()
count=count-1
barrier.signal()
if(count==0){ barrier.wait()}
mutex.signal()
Is this correct?
I'm wondering if there exist some mistakes I didn't detect.

Your pseudocode correctly returns barrier back to initial state. Insignificant suggestion: replace
barrier.signal()
if(count==0){ barrier.wait()}
with IMHO more readable
if(count!=0){ barrier.signal()} //if anyone left pending barrier, release it
However, there may be pitfalls in the way, barrier is reused. Described barrier has two states:
Stop each thread until all of them are pending.
Resume each thread until all of them are running
There is no protection against mixing them: some threads are being resumed, while other have already hit the first stage again. For example, you have bunch of threads, which do some stuff and then sync up on barrier. Each thread body would be:
while (!exit_condition) {
do_some_stuff();
pend_barrier(); // implementation from current question
}
Programmer expects, that number of calls for do_some_stuff() will be the same for all threads. What may (or may not) happen depending on timing: the first thread released from barrier finishes do_some_stuff() calculations before all threads have left the barrier, so it reentered pending for the second time. As a result he will be released along with other threads in the current barrier release iteration and will have (at least) one more call to do_some_stuff().

Related

MTLSharedEventListener block called before command buffer scheduling and not in-flight

I am using a MTLSharedEvent to occasionally relay new information from the CPU to the GPU by writing into a MTLBuffer with storage mode .storageModeManaged within a block registered by the shared event (using the notify(_:atValue:block:) method of MTLSharedEvent, with a MTLSharedEventListener configured to be notified on a background dispatch queue). The process looks something like this:
let device = MTLCreateSystemDefaultDevice()!
let synchronizationQueue = DispatchQueue(label: "com.myproject.synchronization")
let sharedEvent = device.makeSharedEvent()!
let sharedEventListener = MTLSharedEventListener(dispatchQueue: synchronizationQueue)
// Updated only occasionally on the CPU (on user interaction). Mostly written to
// on the GPU
let managedBuffer = device.makeBuffer(length: 10, options: .storageModeManaged)!
var doExtra = true
func computeSomething(commandBuffer: MTLCommandBuffer) {
// Do work on the GPU every frame
// After writing to the buffer on the GPU, synchronize the buffer (required)
let blitToSynchronize = commandBuffer.makeBlitCommandEncoder()!
blitToSynchronize.synchronize(resource: managedBuffer)
blitToSynchronize.endEncoding()
// Occassionally, add extra information on the GPU
if doExtraWork {
// Register a block to write into the buffer
sharedEvent.notify(sharedEventListener, atValue: 1) { event, value in
// Safely write into the buffer. Make sure we call `didModifyRange(_:)` after
// Update the counter
event.signaledValue = 2
}
commandBuffer.encodeSignalEvent(sharedEvent, value: 1)
commandBuffer.encodeWaitForEvent(sharedEvent, value: 2)
}
// Commit the work
commandBuffer.commit()
}
The expected behavior is as follows:
The GPU does some work with the managed buffer
Occasionally, the information needs to be updated with new information on the CPU. In this frame, we register a block of work to be executed. We do so in a dedicated block because we cannot guarantee that by the time execution on the main thread reaches this point the GPU is not simultaneously reading from or writing to the managed buffer. Hence, it is unsafe to simply write to it currently and must make sure the GPU is not doing anything with this data
When the GPU schedules this command buffer to be executed, commands executed before the encodeSignalEvent(_:value:) call are executed and then execution on the GPU stops until the block increments the signaledValue property of the event passed into the block
When execution reaches the block, we can safely write into the managed buffer because we know the CPU has exclusive access to the resource. Once we've done so, we resume execution of the GPU
The issue is that it seems Metal is not calling the block when the GPU is executing the command, but rather before the command buffer is even scheduled. Worse, the system seems to "work" with the initial command buffer (the very first command buffer, before any other are scheduled).
I first noticed this issue when I looked at a GPU frame capture after my scene would vanish after a CPU update, which is where I saw that the GPU had NaNs all over the place. I then ran into this strange situation when I purposely waited on the background dispatch queue with a sleep(:_) call. Quite correctly, my shared resource semaphore (not shown, signaled in a completion block of the command buffer and waited on in the main thread) reached a value of -1 after committing three command buffers to the command queue (three being the number of recycled shared MTLBuffers holding scene uniform data etc.). This suggests that the first command buffer has not finished executing by then time the CPU is more than three frames ahead, which is consistent with the sleep(_:) behavior. Again, what isn't consistent is the ordering: Metal seems to call the block before even scheduling the buffer. Further, in subsequent frames, it doesn't seem that Metal cares that the sharedEventListener block is taking so long and schedules the command buffer for execution even while the block is running, which finishes dozens of frames later.
This behavior is completely inconsistent with what I expect. What is going on here?
P.S.
There is probably a better way to periodically update a managed buffer whose contents are mostly
modified on the GPU, but I have not yet found a way to do so. Any advice on this subject is appreciated as well. Of course, a triple buffer system could work, but it would waste a lot of memory as the managed buffer is quite large (whereas the shared buffers managed by the semaphore are quite small)
I think I have the answer for you, but I'm not sure.
From MTLSharedEvent doc entry
Commands waiting on the event are allowed to run if the new value is equal to or greater than the value for which they are waiting. Similarly, setting the event's value triggers notifications if the value is equal to or greater than the value for which they are waiting.
Which means, that if you are passing values 1 and 2 like you show in your snippet, if will only work a single time and then the event won't be waited and listeners won't be notified.
You have to make sure the value you are waiting on and then signaling is monotonically rising every time, so you have to bump it up by 1 or more.

How mutex guarantee ownership in freeRTOS?

I'm playing with Mutex in freeRTOS using esp32. in some documents i have read that mutex guarantee ownership, which mean if a thread (let's name it task_A) locks up a critical resource (take token) other threads (task_B and task_C) will stay in hold mode waiting for that resource to be unlocked by the same thread that locked it up(which is task_A). i tried to prove that by setting up the other tasks (task_B and task_C) to give a token before start doing anything and just after that it will try to take a token from the mutex holder, which is surprisingly worked without showing any kid of error.
Well, the method i used to verify or display how things works i created a display function that read events published (set and cleared) by each task (when it's in waiting mode it set the waiting bit up if it's working it will set the working bit up etc..., you get the idea). and a simple printf() in case of error in take or give function ( xSemaphoreTake != true and xSemaphoreGive != true).
I can't use the debug mode because i don't have any kind of micro controller debugger.
This is an example of what i'm trying to do:
i created many tasks and each one will call this function but in different time with different setup.
void vVirtualResource(int taskId, int runTime_ms){
int delay_tick = 10;
int currentTime_tick = 0;
int stopTime_tick = runTime_ms/portTICK_PERIOD_MS;
if(xSemaphoreGive(xMutex)!=true){
printf("Something wrong in giving first mutex's token in task id: %d\n", taskId);
}
while(xSemaphoreTake(xMutex, 10000/portTICK_PERIOD_MS) != true){
vTaskDelay(1000/portTICK_PERIOD_MS);
}
// notify that the task with <<task id>> is currently running and using this resource
switch (taskId)
{
case 1:
xEventGroupClearBits(xMutexEvent, EVENTMASK_MUTEXTSK1);
xEventGroupSetBits(xMutexEvent, EVENTRUN_MUTEXTSK1);
break;
case 2:
xEventGroupClearBits(xMutexEvent, EVENTMASK_MUTEXTSK2);
xEventGroupSetBits(xMutexEvent, EVENTRUN_MUTEXTSK2);
break;
case 3:
xEventGroupClearBits(xMutexEvent, EVENTMASK_MUTEXTSK3);
xEventGroupSetBits(xMutexEvent, EVENTRUN_MUTEXTSK3);
break;
default:
break;
}
// start running the resource
while(currentTime_tick<stopTime_tick){
vTaskDelay(delay_tick);
currentTime_tick += delay_tick;
}
// gives back the token
if(xSemaphoreGive(xMutex)!=true){
printf("Something wrong in giving mutex's token in task id: %d\n", taskId);
}
}
You will notice that for the very first time, the first task that will start running in the processor will print out the first error message because it can't give a token while there still a token in the mutex holder, it's normal, so i just ignore it.
Hope someone can explain to me how mutex guarantee ownership using code in freeRTOS. In the first place i didn't use the first xSemaphoreGive function and it worked fine. but that doesn't mean it guarantee anything. or i'm not coding right.
Thank you.
Your example is quite convoluted, I also don't see clear code of task_A, task_B or task_C so I'll try to explain on a simplier example which hopefully explains how mutex guarantees resource ownership.
The general approach to working with mutexes is the following:
void doWork()
{
// attempt to take mutex
if(xSemaphoreTake(mutex, WAIT_TIME) == pdTRUE)
{
// mutex taken - do work
...
// release mutex
xSemaphoreGive(mutex);
}
else
{
// failed to take mutex for 'WAIT_TIME' amount of time
}
}
The doWork function above is the function that may be called by multiple threads at the same time and needs to be protected. This pattern repeats for every function on given resource that needs protection. If resource is more complex, a good approach is to guard the top-most functions that are callable by threads, then if mutex is successfully taken call internal functions that do the actual work.
The ownership guarantee you speak about is the fact that there may not be more than one context (threads, but also interrupts) that are under the if(xSemaphoreTake(mutex, WAIT_TIME) == pdTRUE) statement. In other words, if one context successfully takes the mutex, it is guaranteed that no other context will be able to also take it, unless the original context releases it with xSemaphoreGive first.
Now as for your scenario - while it is not entirely clear to me how it's supposed to work, I can see two issues with your code:
xSemaphoreGive at the beginning of the function - don't do that. Mutexes are by default "given" and you're not supposed to be "giving" it if you aren't the one "taking" it first. Always put a xSemaphoreGive under a successful xSemaphoreTake and nowhere else.
This code block:
while(xSemaphoreTake(xMutex, 10000/portTICK_PERIOD_MS) != true){
vTaskDelay(1000/portTICK_PERIOD_MS);
}
If you need to wait for mutex for longer - specify a longer time. If you want infinite wait, simply specify longest possible time (0xFFFFFFFF). In your scenario, you're polling for mutex every 10s, then delay for 1s during which mutex isn't actually checked, meaning there will be cases where you'll have to wait almost a full second after mutex is released by other thread to start doing work in the current thread that requested it. Waiting for mutex is already done by RTOS in an optimal way - it'll wake the highest priority task currently waiting for the mutex as soon as it's released, there's no need to do more than necessary.
If I was to give an advice of how to fix your example - simplify it and don't do more than needed such as additional calls to xSemaphoreGive or implementing your own waiting for mutex. Isolate the portion of code that performs some work to a separate function that does a single call to xSemaphoreTake at the very top and a single call to xSemaphoreGive only if xSemaphoreTake succeeds. Then call this function from different threads to test whether it works.

What happens if a thread is in the critical section or entering the critical section?

I am trying to better understand a chapter and have been confused about what happens if a thread is in the critical section or is entering the critical section. May someone explain or give me an idea on the process of what the thread undergoes in such circumstances? Thank you.
For an example, let's assume that you have an array, and multiple threads that read and write to the array; and if different threads are reading and writing to the array at the same time they'd see inconsistent data and it'd cause problems. To prevent those problems you protect the array with some kind of lock - before doing anything with the array a thread acquires the array's lock, and when it's finished using the array the thread releases the array's lock.
For example:
acquire_array_lock();
/** Critical section (code that does something with the array) **/
release_array_lock();
There's nothing special about the code in the critical section. It does whatever it was designed to do (maybe sorting the array, maybe adding up all the numbers in the array, maybe displaying the array, etc) using code that's no different to code that you might use to do the same thing in a single-threaded system without locks.
The only special parts are the code to acquire and release the lock.
There are many types of locks (spinlocks, mutexes, semaphores), but they all have the same fundamental principle - when acquiring it you have something (e.g. a variable) to determine if a thread can/can't continue, then either (if the thread can't continue) some kind of waiting or (if the thread can continue) some kind of change to let others know they need to wait; and when releasing you have something to let others know they can stop waiting.
The main difference between different kinds of locks is the implementation details - what kind of data is used to determine if a thread can/can't continue, and how a thread waits.
For the simplest kind of lock (a spinlock) you might just have a single "yes/no" flag, a little bit like this (but not literally like this):
acquire_lock(void) {
while(myLock == 0) {
// do nothing then retry
}
myLock = 1;
}
release_lock(void) {
myLock = 0;
}
However this won't work because two or more threads can see that myLock == 0 at the same time and think they can both continue (and then do the myLock = 1 after it's too late). To fix this you need assembly language or special language support for atomic operations (e.g. a special function for "test and set" or "compare and exchange").
The reason this is called a "spinlock" is that (if a thread needs to wait) it wastes CPU time continually checking ("spinning") to see if it can continue. Instead of doing this (to avoid wasting CPU time), a thread could tell a scheduler not to give it any CPU time until the lock is released; and this is how a mutex works.

Volatile vars and multi-core thread synchronization!

I have several threads executing concurrently and checking a value of a field in their own object. The field is set by the launch thread like this:
for (i = 0; i < ThreadCount; i++)
{
ThreadContext[i].MyField = 1;
}
Within each thread then I check the value of this value:
if (MyField == 1)
{
...//do something
}
However, I noticed that on a 4 core CPU, some of the (4) running threads need several miliseconds or even longer in order to see the changed MyField. MyField is a single char field. What appears to be happening is that when the memory bus is maxed out by the first thread which detects the change, all other threads may stall almost for the entire duration of the run of the first. (assuming there is enough memory pressure). Only when the first thread eases on memory (and does more with registers), is when other threads also get to see the new value.
I checked the asm and there is no compiler optimization in the way here. Calls go directly to memory. How can this be fixed?
Thanks!
Jam
I got feedback from Intel: Yes, that's how it works (no easy fix).

Is blocking on mutex equivalent to empty while cycle?

I'm writing a concurrency application for the iPhone.
I wonder if this code:
while(!conditionBoolean)
{
// do nothing
// until another thread makes this variable true.
}
makeWork();
Is equivalent to the following:
[lock lock]; // this lock is locked by another thread
// causing the current to block until it's unlocked
[lock unlock];
makeWork();
If it's not, what's the difference?
Thank you.
You should prefer the second, the first will produce a tight loop and delay or maybe even prevent the variable being set in the way you want/expect. At the very least you would have to introduce a delay in that loop, a sleep of some kind.
Better still would be to wait on a signalling primitive for the work to complete, which then gets signalled by the other thread - the design is then deterministic, versus depending on a mutex or state variable that some other thread might lock or modify before you get your chance. In general, it's better for a multi-threaded design to be event-driven (push model), not check shared state opportunistically (pull model).
My understanding of mutexes is that the lock can occur in less cycles, so for example it's possible that while you read the conditionboolean to become true, it's possible that another thread could still change it to true while you're reading it, and another goes to false before you read it again. This turns into a race condition, which the mutex locking would hope to avoid. Also this could cause your code not to be the "next in line" if you have numerous functions with a similar while loop.