Mutexes inside interrupt service routine - mutex

In linux, why can't we have a mutex inside an isr() routine for protecting a shared resource ?

Because a lock operation on a mutex can sleep and it's illegal to sleep in an ISR. Use a spinlock instead.

Related

Why disable interrupt before context switch

I was reading the OS textbook, in the synchronization chapter,it says :
In particular,
most implementations of thread systems enforce the invariant that a thread
always disables interrupts before performing a context switch
Hence when writing Aquire() before go to sleep it will first disable interrupt.
My question is why interrupt disable is needed before context switch, is it use to protect the registers and keep the Aquire() atomic?
Aquire() is used before the critical section as:
Aquire(){
disable interrupt;
if (is busy){
put on wait queue;
sleep();
}
else set_busy;
enable interrupt;
}
Go to sleep will implement context switch,why should we disable interrupt during context switch?Can we change the code to :
Aquire(){
disable interrupt;
if (is busy){
enable interrupt;
put on wait queue;
sleep();
}
else set_busy;
enable interrupt;
}
That is enables interrupt in thread A instead of letting other thread B after context switch(after A go to sleep) enable interrupt?
Typically, a synchronization primitive requires updating multiple data locations simultaneously. For example, a semaphore Acquire might require changing the state of the current thread to blocked, updating the count of the semaphore, removing the current thread from a queue and placing it on another queue. Since simultaneously isn't really possible(*), it is necessary to devise an access protocol to simulate this. In a single cpu system, the easiest way to do this is disable interrupts, perform the updates, then re-enable interrupts. All software following this protocol will see the updates at once.
Multi-cpu systems typically need something extra to synchronize threads on separate cpus from interfering. Disabling interrupts is insufficient, since that only affects the current cpu. The something extra is typically a spin lock, which behaves much like a mutex or binary semaphore, except that the caller sits in a retry loop until it becomes available.
Even in the multi-cpu system, the operation has to be performed with interrupts disabled. Imagine Thread#0 has acquired a spinlock on cpu#0; then an interrupt on cpu#0 causes Thread#1 to preempt, and Thread#1 then attempts to acquire the same spinlock. There are many scenarios which amount to this.
(*) Transaction-al Memory provides something like this, but with limited applicability, and the implementation has to provide an independent implementation to ensure forward progress. Also, since transactions do not nest, they really need to disable interrupts as well.

Mutex in RTOSes in this specific situation

Consider the following codes:
/*----------------------------------------------------------------------------
First Thread
*---------------------------------------------------------------------------*/
void Thread1 (void const *argument)
{
for (;;)
{
osMutexWait(mutex, osWaitForever);
Thread1_Functions;
osMutexRelease(mutex);
}
}
/*----------------------------------------------------------------------------
Second Thread
*---------------------------------------------------------------------------*/
void Thread2 (void const *argument)
{
for(;;)
{
osMutexWait(mutex, osWaitForever);
Thread2_Functions;
osMutexRelease(mutex);
}
}
As far as I've noticed from RTOS's scheduling ,RTOS assign a specific time to each task and after this time is over,it switches to the other task.
Then in this specific time,inside task's infinite loop ,maybe loop is repeated several times until task's specific time finished.
Assume task is finished in less than of it's time's half,then it has a time to fully run this task once again.
in last line after releasing mutex , then it will achieve mutex before than task2 for second time,Am I true ?
assume timer tick occur when MCU run Thread1_Functions for second time,then task2 cant run because mutex owned by task1, RTOS run task 1 again and if timer tick occur every time in the Thread1_Functions, then task2 has no chance to running,Am I true ?
First, let me clear up the scheduling method that you described. You said, "RTOS assign a specific time to each task and after this time is over, it switches to the other task." This scheduling method is commonly called "time slicing". And all RTOS do not necessarily use this method all the time. Time slicing may be used for tasks that have the same priority (or if the RTOS does not support task priorities). But if the tasks have different priorities then the scheduler will not use time-slicing and will instead schedule according to task priority.
But let's assume that the two tasks in your example have the same priority and the scheduler is time-slicing.
Thread1 runs and gets the mutex.
Thread1's time slice expires and the scheduler switches to Thread2.
Thread2 attempts to get the mutex but blocks since Thread1 already owns the mutex.
The scheduler switches back to Thread1 since Thread2 is blocked.
Thread1 releases the mutex.
When the mutex is released, the scheduler should switch to any higher priority task that is waiting for the mutex. But since Thread2 is the same priority, let's assume the scheduler does not switch and Thread1 continues to run within its time slice.
Thread1 attempts to get the mutex again.
In your scenario Thread1 successfully gets the mutex again and this could result in Thread2 never being able to run. In order to prevent this from happening the mutex service should prioritize requests for the mutex. Mutex requests from higher priority tasks receive higher priority. And mutex requests from equal priority tasks should be first come, first served. In other words, the mutex service should put requests from equal priority tasks into a queue. Remember Thread2 already has a pending request for the mutex (step 3 above). So when Thread1 attempts to get the mutex again (step 6), Thread1's request should be queued behind the earlier request from Thread2. And when Thread1's second request for the mutex get queued behind the request from Thread2, the scheduler should block Thread1 and switch to Thread2, giving the mutex to Thread2.
Update: The above is just an idea for how an unspecified RTOS might handle the situation in order to avoid starving Thread2. You didn't mention a specific RTOS until your comment below. I don't know whether Keil RTX works like I described above. And now I'm wondering what your question really is.
Are you asking what will Keil RTX do in this situation? I'm not sure. You'll have to look at the code for osMutexRelease() to see whether it switches to a task with the same priority. Also look at osMutexWait() to see how it prioritizes tasks of the same priority.
Or are you stating the Keil RTX allows Thread2 to starve in this situation and are you asking how to fix it. To fix this situation you could call osThreadYeild() after releasing the mutex. Like this:
void Thread1 (void const *argument)
{
for (;;)
{
osMutexWait(mutex, osWaitForever);
Thread1_Functions;
osMutexRelease(mutex);
osThreadYeild();
}
}

Does Perl safely defer INT signals during a Storable write?

I'm concerned about data corruption while executing a Storable::store operation. I'm writing about 100 MB to an NFS to backup my calculations at particular checkpoints but the process isn't exactly speedy.
To try to prevent corruption, I have a SIG{INT} signal handler. Right before Storable::store is called, a global variable is set indicating it isn't safe to terminate. As soon as Storable::store completes, that global variable is returned to a value to indicate it's okay to interrupt.
That global variable is used to indicate whether or not the signal handler will call die or whether it will just print a statement saying "Can't stop yet."
But am I really helping thing? I see now from reading perlipc that interrupting IO is sometimes done safely, and sometimes it is not... That is, should my signal handler end up being called in the middle of my Storeable::store operation, such a brief diversion to my signal handler subroutine may still be enough to screw things up.
Does anyone know how Storable performs in such a situation? Or is my signal handling setup actually appropriate?
Since 5.8.1, Perl uses "safe signals" (by default). When you setup a signal handler through %SIG, Perl actually installs a simple signal handler that does nothing but increment a counter. In between Perl ops, Perl checks if the counter is non-zero, and calls your signal handler if it is. That way, your signal handlers don't execute in the middle of a system call or library call.
There are only two things you need to worry about:
Modifying global vars (e.g. $!) in your signal handler.
System calls returning EINTR or EAGAIN because a signal came in during the call.
If you're really concerned that SIGINT could break store, try adding SA_RESTART to your signal handler. On Linux, this will force the system calls to be automatically retried upon a signal. This will probably defer your signal handler indefinitely since SIGINT will not be able to break out of the I/O operation. However, it should enable safe operation of Storable::store under any interruption circumstance.

The reason why Task deletion of uCOS should not occur during ISR

I'm modifying some functionalities (mainly scheduling) of uCos-ii.
And I found out that OSTaskDel function does nothing when it is called by ISR.
Though I learned some basic features of OS, I really don't understand why that should be prohibited.
All it does is withrawl from readylist and release of acquired resources like TCB or semaphores...
Is there any reason for them to be banned while handling interrupt?
It is not clear from the documentation why it is prohibited in this case, but OSTaskDel() explicitly calls OS_Sched(), and in an ISR this should only happen when the outer-most nested interrupt handler exists (handled by OSIntExit()).
I don't think the following is advisable, because there may be other reasons why this is prohibited, but you could remove the:
if (OSIntNesting > 0) {
return (OS_TASK_DEL_ISR);
}
then make the OS_Sched() call conditional as follows:
if (OSIntNesting == 0) {
OS_Sched();
}
If this dies horribly, remember I said it was ill-advised!
This operation will extend your interrupt processing time in any case so is probably a bad idea if only for that reason.
It is a bad idea in general (not just from an ISR) to asynchronously delete another task regardless of that tasks state or resource usage. uC/OS-II provides the OSTaskDelReq() function to manage task deletion in a way that allows a task to delete itself on request and therefore be able to correctly release all its resources. Even without that, sending a request via the task's normal IPC mechanisms is usually better (and more portable).
If a task is not designed for self-deletion on demand, then you might simply use OSSuspend().
Generally, you cannot do a few things in ISRs:
block on a semaphore and the like
block while acquiring a spin lock, if it's a single-CPU system
cause a page fault, that has to be resolved by the virtual memory subsystem (with virtual on-disk memory, that is)
If you do any of the above in an ISR, you'll have a deadlock.
OSTaskDel() is probably doing some of those things.

Restrictions while kernel is running an ISR routine

What are some of the important do's and dont's inside a kernel mode and ISR Routine ?
For example -
Is context-switching disabled while running an interrupt handler ?
Can a context switch happen when a process is inside a critical
section ?
What circumstances inside kernel mode merit disabling of further interrupts ?
How come a process switch can occur on a page-fault, where a process fetches data from the disk, but not happen during other occurences of interrupts.
How do you classify if a executable path can be interrupted/rescheduled/pre-empted ?
What are the other things one has to remember when process is in kernel mode or handling ISR routine ?
In short: NO CONTEXT SWITCH, EVER.
This means:
No preemption
No locks on mutexes (use spin locks instead and ensure your non-ISR counterparts acquire them with spin_lock_irqsave to disable IRQs)
No call to any kernel function that can sleep (check the function's documentation, some functions also have _cansleep variants).
A process switch can occur on a page fault, but it happens after the corresponding ISR has been processed. Basically a path can be scheduled if it is not an ISR and if you do not have a spinlock locked. If you hold a spinlock, you must avoid sleeping until it is released.
Since ISRs are very restrained, then handling of IRQs is usually split between a top-half (that runs in ISR context and does the critical job) and a bottom-half (that runs later as a kernel thread and does whatever can be delayed) which can sleep. See this page for more information:
http://www.makelinux.net/ldd3/chp-10-sect-4