I am trying to debug a complex Perl application that terminates with the error message "Signal SIGCHLD received, but no signal handler set". I know that it comes from the Perl interpreter itself, notably from the file mg.c and it cannot be caught. But I don't understand the exact nature of it.
Is this an internal error in the interpreter of the kind "should not happen"?
Or is there a (simple) way to reproduce this error with recent Perl versions?
I have already tried to reproduce it with hints given in https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html by setting and unsetting a signal handler in an endless loop and constantly firing that signal in and endless loop from another script. But I could not reproduce the behavior described there with Perl versions 5.18.4, 5.26.0, 5.26.2, 5.28.2, and 5.30.2.
I have also found Signal SIGSTOP received, but no signal handler set in perl script where somebody assigned to $SIG{SIGSTOP} instead of $SIG{STOP} but that also does not help to make the problem reproducible with a simple script.
All of the perls I have tested were built without thread support:
$ perl -Mthreads
This Perl not built to support threads
Compilation failed in require.
BEGIN failed--compilation aborted.
I am answering my own question here (to the best of my knowledge to date):
The error vanished by inserting these two lines:
$SIG{CHLD} ||= 'DEFAULT';
$SIG{HUP} ||= 'DEFAULT';
I would not call this a fix but rather a workaround because the value of "DEFAULT" should trigger the exact same behavior as no value resp. undef.
The error message is an internal error of Perl. The interpreter bails out here as a guard against signal handling bugs in Perl.
That being said, there is also no simple example that reproduces the error. And if there was one, it would be a bug in Perl.
A similar error was reported for GNU parallel some time ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html
The error reported there has a couple of things in common with the error that I encountered, notably that it occurred after fork()ing.
My application is a server based on Net::Server, and the error occurs, when a request handler is spawning a child process. Interestingly, the error message (and exit) happens before the child terminates.
The child process can potentially run for a very long time. Therefore it is made a session leader with setsid(), all open file descriptors are closed, and standard input is redirected to /dev/null, before exec() is being called. In other words, it is kind of daemonized.
It should also be noted that the error vanishes, when small modifications to the code are done, for example dumping the contents of %SIG for debugging purposes.
The error also did not occur, with Perl versions 5.8.9, 5.14.4, and 5.16.3. With 5.18.4, 5.26.2, and 5.30.2 it can always be reproduced. All of these executables had been built without interpreter thread support.
The "but no signal handler set" message is particular to threads, and signal handling works differently inside threads than it does in your main process.
The main process can receive a signal from any number of places -- a kill call in the program, a kill call from another Perl program, from the operating system, from a /usr/bin/kill command from the command line, etc. Perl has "default" signal handlers for all of the signals that can be sent from your system. You can trap some of them by setting up handlers in the global %SIG variable, though some signals (notable SIGKILL and SIGSTOP) cannot be trapped by your program.
Signals within threads are an emulation of signals in your parent operating system. They can only be sent by a call to threads::kill, which means they can only be signalled from within your Perl program. Perl does not setup any default signal handlers for thread signals, and warns you when an unhandled signal is given to a thread. And unlike the regular signal set, you may trip KILL and STOP signals to threads.
To be defensive, set a signal handler for all signals. Maybe something like:
use Carp;
sub thr_signal_handler {
Carp::cluck("Received SIG$_[0] in ",threads->tid(),"\n";
}
# inside thread initialization function
$SIG{$_} = \&thr_signal_handler for keys %SIG;
...
Related
I know that an interrupt causes the OS to change a CPU from its current task and to run a kernel routine. I this case, the system has to save the current context of the process running on the CPU.
However, I would like to know whether or not a context switch occurs when any random process makes a system call.
I would like to know whether or not a context switch occurs when any random process makes a system call.
Not precisely. Recall that a process can only make a system call if it's currently running -- there's no need to make a context switch to a process that's already running.
If a process makes a blocking system call (e.g, sleep()), there will be a context switch to the next runnable process, since the current process is now sleeping. But that's another matter.
There are generally 2 ways to cause a content switch. (1) a timer interrupt invokes the scheduler that forcibly makes a context switch or (2) the process yields. Most operating systems have a number of system services that will cause the process to yield the CPU.
well I got your point. so, first I clear a very basic idea about system call.
when a process/program makes a syscall and interrupt the kernel to invoke syscall handler. TSS loads up Kernel stack and jump to syscall function table.
See It's actually same as running a different part of that program itself, the only major change is Kernel play a role here and that piece of code will be executed in ring 0.
now your question "what will happen if a context switch happen when a random process is making a syscall?"
well, nothing will happen. Things will work in same way as they were working earlier. Just instead of having normal address in TSS you will have address pointing to Kernel stack and SysCall function table address in that random process's TSS.
In OpenSSl, The man pages for The majority of SSL_* calls indicate an error by returning a value <= 0 and suggest calling SSL_get_error() to get the extended error.
But within the man pages for these calls as well as for other OpenSSL library calls, there are vague references to using the "error queue" in OpenSSL - Such is the case in the man page for SSL_get_error:
The current thread's error queue must be empty before the TLS/SSL I/O
operation is attempted, or SSL_get_error() will not work reliably.
And in that very same man page, the description for SSL_ERROR_SSL says this:
SSL_ERROR_SSL
A failure in the SSL library occurred, usually a protocol error.
The OpenSSL error queue contains more information on the error.
This kind of implies that there is something in the error queue worth reading. And failure to read it makes a subsequent call to SSL_get_error unreliable. Presumably, the call to make is ERR_get_error.
I plan to use non-blocking sockets in my code. As such, it's important that I reliably discover when the error condition is SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE so I can put the socket in the correct polling mode.
So my questions are this:
Does SSL_get_error() call ERR_get_error() implicitly for me? Or do I need to use both?
Should I be calling ERR_clear_error prior to every OpenSSL library call?
Is it possible that more than one error could be in the queue after an OpenSSL library call completes? Hence, are there circumstances where the first error in the queue is more relevant than the last error?
SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue.
You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread.
I'm concerned about data corruption while executing a Storable::store operation. I'm writing about 100 MB to an NFS to backup my calculations at particular checkpoints but the process isn't exactly speedy.
To try to prevent corruption, I have a SIG{INT} signal handler. Right before Storable::store is called, a global variable is set indicating it isn't safe to terminate. As soon as Storable::store completes, that global variable is returned to a value to indicate it's okay to interrupt.
That global variable is used to indicate whether or not the signal handler will call die or whether it will just print a statement saying "Can't stop yet."
But am I really helping thing? I see now from reading perlipc that interrupting IO is sometimes done safely, and sometimes it is not... That is, should my signal handler end up being called in the middle of my Storeable::store operation, such a brief diversion to my signal handler subroutine may still be enough to screw things up.
Does anyone know how Storable performs in such a situation? Or is my signal handling setup actually appropriate?
Since 5.8.1, Perl uses "safe signals" (by default). When you setup a signal handler through %SIG, Perl actually installs a simple signal handler that does nothing but increment a counter. In between Perl ops, Perl checks if the counter is non-zero, and calls your signal handler if it is. That way, your signal handlers don't execute in the middle of a system call or library call.
There are only two things you need to worry about:
Modifying global vars (e.g. $!) in your signal handler.
System calls returning EINTR or EAGAIN because a signal came in during the call.
If you're really concerned that SIGINT could break store, try adding SA_RESTART to your signal handler. On Linux, this will force the system calls to be automatically retried upon a signal. This will probably defer your signal handler indefinitely since SIGINT will not be able to break out of the I/O operation. However, it should enable safe operation of Storable::store under any interruption circumstance.
I have a perl script which calls fork() a few times to create 4 child processes. The parent process then uses waitpid() to wait for all of the children to finish.
The problem occurs when I try to call system() from within the child processes (I'm using it to create directories). Even something as simple as system("dir") fails (yes, I'm on Windows).
By "fails", I mean that one of the child threads goes past it no problem, but the other child processes so far as I can tell simply cease to exist.
trace INFO, "Thread $thread_id still alive at 2.62";
system("dir");
trace INFO, "Thread $thread_id still alive at 2.65";
I get messages such as "Thread 3 still alive at 2.62", but only 1 of the child threads ever gets to 2.65.
At the bottom of the log, I can see "Command exited with non-zero status 127", which I think may have something to do with it.
I've considered using some sort of a mutex lock to make sure that only 1 processes at a time goes through the system calls, but how can I do that with fork()? Also, this problem doesn't really make any sense in the first place, why would several independent processes have trouble doing system("dir") at the same time?
The problem here is that fork() is emulated under windows using threads. So there is no real processes created.
If you are only use system call to create folders, then you'd better use perl function mkdir or File::Path's make_path instead.
I want to write a robust daemon in perl that will run on Linux and am following the template described in this excellent answer. However there are a few differences in my situation: First I am using Parallel::ForkManager start() and next; to fork on an event immediately followed by exec('handle_event.pl')
In such a situation, I have the following questions:
Where should I define my signal handlers. Should I define them in the parent (the daemon) and assume that they will be inherited in the children?
If I run exec('handle_event.pl') will the handlers get inherited across the exec (I know that they are inherited across the fork)?
If I re-define a new signal handler in handle_event.pl will this definition override the one defined in the parent?
What are best practices in a situation like this?
Thank you
When you fork, the child process has the same signal handlers as the parent. When you exec, any ignored signals remain ignored; any handled signals are reset back to the default handler.
The exec replaces the whole process code with the code that will be executed. As signal handlers are code in the process image, they cannot be inherited across an exec, so exec will reset the signal handling dispositions of handled signals to their default states (ignored signals will remain ignored). You will therefore need to install any signal handling in the execed process when it starts up.