Does Perl safely defer INT signals during a Storable write? - perl

I'm concerned about data corruption while executing a Storable::store operation. I'm writing about 100 MB to an NFS to backup my calculations at particular checkpoints but the process isn't exactly speedy.
To try to prevent corruption, I have a SIG{INT} signal handler. Right before Storable::store is called, a global variable is set indicating it isn't safe to terminate. As soon as Storable::store completes, that global variable is returned to a value to indicate it's okay to interrupt.
That global variable is used to indicate whether or not the signal handler will call die or whether it will just print a statement saying "Can't stop yet."
But am I really helping thing? I see now from reading perlipc that interrupting IO is sometimes done safely, and sometimes it is not... That is, should my signal handler end up being called in the middle of my Storeable::store operation, such a brief diversion to my signal handler subroutine may still be enough to screw things up.
Does anyone know how Storable performs in such a situation? Or is my signal handling setup actually appropriate?

Since 5.8.1, Perl uses "safe signals" (by default). When you setup a signal handler through %SIG, Perl actually installs a simple signal handler that does nothing but increment a counter. In between Perl ops, Perl checks if the counter is non-zero, and calls your signal handler if it is. That way, your signal handlers don't execute in the middle of a system call or library call.
There are only two things you need to worry about:
Modifying global vars (e.g. $!) in your signal handler.
System calls returning EINTR or EAGAIN because a signal came in during the call.

If you're really concerned that SIGINT could break store, try adding SA_RESTART to your signal handler. On Linux, this will force the system calls to be automatically retried upon a signal. This will probably defer your signal handler indefinitely since SIGINT will not be able to break out of the I/O operation. However, it should enable safe operation of Storable::store under any interruption circumstance.

Related

Error message "Signal SIGCHLD received, but no signal handler set."

I am trying to debug a complex Perl application that terminates with the error message "Signal SIGCHLD received, but no signal handler set". I know that it comes from the Perl interpreter itself, notably from the file mg.c and it cannot be caught. But I don't understand the exact nature of it.
Is this an internal error in the interpreter of the kind "should not happen"?
Or is there a (simple) way to reproduce this error with recent Perl versions?
I have already tried to reproduce it with hints given in https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html by setting and unsetting a signal handler in an endless loop and constantly firing that signal in and endless loop from another script. But I could not reproduce the behavior described there with Perl versions 5.18.4, 5.26.0, 5.26.2, 5.28.2, and 5.30.2.
I have also found Signal SIGSTOP received, but no signal handler set in perl script where somebody assigned to $SIG{SIGSTOP} instead of $SIG{STOP} but that also does not help to make the problem reproducible with a simple script.
All of the perls I have tested were built without thread support:
$ perl -Mthreads
This Perl not built to support threads
Compilation failed in require.
BEGIN failed--compilation aborted.
I am answering my own question here (to the best of my knowledge to date):
The error vanished by inserting these two lines:
$SIG{CHLD} ||= 'DEFAULT';
$SIG{HUP} ||= 'DEFAULT';
I would not call this a fix but rather a workaround because the value of "DEFAULT" should trigger the exact same behavior as no value resp. undef.
The error message is an internal error of Perl. The interpreter bails out here as a guard against signal handling bugs in Perl.
That being said, there is also no simple example that reproduces the error. And if there was one, it would be a bug in Perl.
A similar error was reported for GNU parallel some time ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html
The error reported there has a couple of things in common with the error that I encountered, notably that it occurred after fork()ing.
My application is a server based on Net::Server, and the error occurs, when a request handler is spawning a child process. Interestingly, the error message (and exit) happens before the child terminates.
The child process can potentially run for a very long time. Therefore it is made a session leader with setsid(), all open file descriptors are closed, and standard input is redirected to /dev/null, before exec() is being called. In other words, it is kind of daemonized.
It should also be noted that the error vanishes, when small modifications to the code are done, for example dumping the contents of %SIG for debugging purposes.
The error also did not occur, with Perl versions 5.8.9, 5.14.4, and 5.16.3. With 5.18.4, 5.26.2, and 5.30.2 it can always be reproduced. All of these executables had been built without interpreter thread support.
The "but no signal handler set" message is particular to threads, and signal handling works differently inside threads than it does in your main process.
The main process can receive a signal from any number of places -- a kill call in the program, a kill call from another Perl program, from the operating system, from a /usr/bin/kill command from the command line, etc. Perl has "default" signal handlers for all of the signals that can be sent from your system. You can trap some of them by setting up handlers in the global %SIG variable, though some signals (notable SIGKILL and SIGSTOP) cannot be trapped by your program.
Signals within threads are an emulation of signals in your parent operating system. They can only be sent by a call to threads::kill, which means they can only be signalled from within your Perl program. Perl does not setup any default signal handlers for thread signals, and warns you when an unhandled signal is given to a thread. And unlike the regular signal set, you may trip KILL and STOP signals to threads.
To be defensive, set a signal handler for all signals. Maybe something like:
use Carp;
sub thr_signal_handler {
Carp::cluck("Received SIG$_[0] in ",threads->tid(),"\n";
}
# inside thread initialization function
$SIG{$_} = \&thr_signal_handler for keys %SIG;
...

How does the computer implement callbacks?

I already know the general usage of callback. First,I register a "callback function",when some event occur,this function will be triggered(be executed).
What confuses me is how do I know if the event is occur? The solution I can get is polling.Is there a better way to check whether the event occur in less than the O(n) time ?
All right,Maybe the above question is too abstract.A more realistic description is does epoll_wait avoid using O(n) time to check whether the ready file descriptor?
If so, how did it do it?
Is there a callback mechanism that is different from polling essentially?
Usually, but not exclusively, callbacks get called after some peripheral I/O device signals an operation completion by raising a hardware interrupt. A long chain of stuff involving things like driver interrupt handlers, semaphores, protection ring changes, thread and process context changes, message assembly/enqueueing/requiring/handling/dispatching etc etc then cause your callback to be called, maybe by some system thread, or from a message-handling or signal-handling thread of your own that has to conform to a specific structure or constraint.
So no, polling is generally unnecessary, and unwanted.

How to cancel a process waiting to return with ptrace()

I am trying to cancel when a process calls wait(), read(), recvfrom() or similar, because if I use ptrace on it, after the PTRACE_ATTACH and later PTRACE_CONT, my tracer becomes blocked until the function in the tracee returns. Also I think it happens the same with sleep().
Would be possible to cancel the call, or reproduce a fake return?
Thanks.
Yes, you should send a PTRACE_INTERRUPT. This will trigger the syscall to exit.
To do this, you need not to waitpid on your tracee, because that would block you (the tracer) too.
You can either have multiple threads: one that will block on the tracee, one that will "decide" to cancel the blocking syscall - e.g. a GUI thread that the user will press "cancel" (like a normal debugger, e.g. GDB).
Or you can use PTRACE_SYSCALL to manually diagnose every syscall the program is doing and then decide preemptively if you wish to execute that syscall. This way you can decide to not run wait at all, or perhaps mock them by having your return value instead.

Do both traps and interrupts give control of the hardware to the CPU

I am very confused whether both traps and interrupts can give control of the hardware to the CPU.
Can someone explain why this won't hold or not?
I think it would be more accurate to say that both traps and interruptions get processed by an interrupt handler (there's a trap handler and interrupt handler but I think it's the same concept).
The interrupt handler then processes the raised interrupt and attempts to resolve it. With a trap it may be something like a division by 0 and with an interrupt it could be something like the disk finished writing a file.
In some cases the trap may be "intentional" - this is useful if your program requires some resources it doesn't have and wants to request them. It raises an exception (trap) and attempts to initiate a context switch to another process while it waits for its resources (no point in hogging the CPU if it's just waiting).
So as you can see, an interrupt can necessitate hardware control but a trap (context switch) may not necessitate hardware use.
I think the best way to view a fault/trap/interrupt is as a function call. The operating system sets up a vector of handlers for the different events. When they occur, the CPU calls the appropriate function.
The only oddity is that an interrupt can occur asychronously. Faults and traps occur as the result of the instruction stream.

calling system("command") from signal handler

In a signal handler, I saw system() used to invoke some shell commands, like
void
sig_handler(int signum) {
system("command1");
system("command2");
system("command3");
signal(signum, SIG_DFL);
}
Is it safe to make calls in this way, in a signal handler (bound to SIGSEGV, SIGABRT, SIGBUS ...), and then invoke the default handler ?
or does it depend on the commands being invoked ?
The system call is not documented to be safe to call from a signal handler (or at least I cannot find such a documentation), so I'd conclude that this code is not guaranteed to be safe.
However, fork, execve, waitpid and signal are all documented to be safe, so I think the functionality of that code should be safe in principle, if implemented using fork/exec/waitpid instead of system.