How can I attach a debugger to a running Perl process? - perl

I have a running Perl process that’s stuck, I’d like to poke inside with a debugger to see what’s wrong. I can’t restart the process. Can I attach the debugger to the running process? I know I can do gdb -p, but gdb does not help me. I’ve tried Enbugger, but failed:
$ perl -e 'while (1) {}'&
[1] 86836
$ gdb -p 86836
…
Attaching to process 86836.
Reading symbols for shared libraries . done
Reading symbols for shared libraries ............................. done
Reading symbols for shared libraries + done
0x000000010c1694c6 in Perl_pp_stub ()
(gdb) call (void*)Perl_eval_pv("require Enbugger;Enbugger->stop;",0)
perl(86836) malloc: *** error for object 0x3: pointer being realloc'd was not allocated
*** set a breakpoint in malloc_error_break to debug
Program received signal SIGABRT, Aborted.
0x00007fff8269d82a in __kill ()
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (Perl_eval_pv) will be abandoned.
(gdb)
Am I doing it wrong? Are there other options?
P.S. If you think you could benefit from a debugger attached to a running process yourself, you can insert a debugger back door triggered by SIGUSR1:
use Enbugger::OnError 'USR1';
Then you can simply kill -USR1 pid and your process will jump into the debugger.

First, please use a DEBUGGING perl, if you want to inspect it with gdb.
Please define "stuck". Busy or non-busy waiting (high or low CPU), eating memory or not?
With while 1 it is busy waiting. I usually get busy waiting (endless cycles) on HV corruption in Perl_hfree_next_entry() since 5.15. Non-busy waiting is usually waiting on a blocking IO read.
I get the correct:
`0x00007fba15ab35c1 in Perl_runops_debug () at dump.c:2266`
`2266 } while ((PL_op = PL_op->op_ppaddr(aTHX)));`
and can inspect everything, much more than with a simple perl debugger. With a non-threaded perl you have to type less.
`(gdb) p Perl_op_dump(PL_op)`
and so on.
If you have to do with perl: Inside the pp_stub function it is not a good idea to enter the Enbugger runloop, you should be in the main runloop in dump.c. Set a breakpoint to the shown line.
"error for object 0x3" on eval sound like internal corruption in the context, so you should look at the cx and stack pointers. Probably because you started it in a bad context.

I've never used gdb, but maybe you could get something useful out of strace?
strace -f -s512 -p <PID>

http://metacpan.org/pod/App::Stacktrace
“perl-stacktrace prints Perl stack traces of Perl threads for a given Perl process. For each Perl frame, the full file name and line number are printed.”

Related

Catch only second chance exception with windbg

I need to debug a running program running on windows.
It some times crashes with "memory access violation".
With windbg (usage of IDE not possible) I attached to running process (it is a requirement the program shall not stop)
The command line is
windbg -g -p <pid>
The problem is that I now catch all first chance exceptions but I am only interested in any second chance exception (do not care which type of exception).
How can I setup windbg to catch any second chance exception?
WinDbg will catch second chance exceptions by default, so you just need to turn off the first chance exceptions. Doing this for a single type of exception is simple:
0:000> sxd av
0:000> *** Check the setting
0:000> .shell -ci "sx" find "av"
See set all exceptions to set all exception types to second-chance only.
Since it does not seem to be an option to perform those commands at debug time, you can also try to configure a Workspace that has exception handling disabled and then reuse the workspace. For understanding the concept of Workspaces, the MSDN article Uncovering how Workspaces work was really helpful. It is a set of experiments that you should do yourself once.
With that background knowledge, attach to any process
0:000> .foreach(exc {sx}) {.catch{sxd ${exc}}}
0:000> *** perhaps some other useful workspace relevant commands here
0:000> *** e.g. .symfix seems useful
0:000> *** File / Save Workspace As ...
0:000> *** Enter a name, e.g. myworkspace
0:000> q
Restart WinDbg with the -W myworkspace command line switch. Attach to any process. Check if your setting have been applied (e.g. sx, .sympath). If everything is fine, you can start debugging.

Odd behavior with Perl system() command

Note that I'm aware that this is probably not the best or most optimal way to do this but I've run into this somewhere before and I'm curious as to the answer.
I have a perl script that is called from an init that runs and occasionally dies. To quickly debug this, I put together a quick wrapper perl script that basically consists of
#$path set from library call.
while(1){
system("$path/command.pl " . join(" ",#ARGV) . " >>/var/log/outlog 2>&1");
sleep 30; #Added this one later. See below...
}
Fire this up from the command line and it runs fine and as expected. command.pl is called and the script basically halts there until the child process dies then goes around again.
However, when called from a start script (actually via start-stop-daemon), the system command returns immediately, leaving command.pl running. Then it goes around for another go. And again and again. (This was not fun without the sleep command.). ps reveals the parent of (the many) command.pl to be 1 rather than the id of the wrapper script (which it is when I run from the command line).
Anyone know what's occurring?
Maybe the command.pl is not being run successfully. Maybe the file doesn't have execute permission (do you need to say perl command.pl?). Maybe you are running the command from a different directory than you thought, and the command.pl file isn't found.
There are at least three things you can check:
standard error output of your command. For now you are swallowing it by saying 2>&1. Remove that part and observe what errors the system command produces.
the return value of system. The command may run and system may still return an exit code, but if system returns 0, you know the command was successful.
Perl's error variable $!. If there was a problem, Perl will set $!, which may or may not be helpful.
To summarize, try:
my $ec = system("command.pl >> /var/log/outlog");
if ($ec != 0) {
warn "exit code was $ec, \$! is $!";
}
Update: if multiple instance of the command keep showing up in your ps output, then it sounds like the program is forking and running itself in the background. If that is indeed what the command is supposed to do, then what you do NOT want to do is run this command in an endless loop.
Perhaps when run from a deamon the "system" command is using a different shell than the one used when you are running as yourself. Maybe the shell used by the daemon does not recognize the >& construct.
Instead of system("..."), try exec("...") function if that works for you.

Trapping signals cleanly in Perl

I have a simple Perl script that simply prints a line of text to stdout. What I want to accomplish is that while this script runs, if I (or someone else) issues a signal to that process to stop, I want it to trap that signal and exit cleanly. The code I have looks like the following
#!/usr/bin/perl -w
$| = 1;
use sigtrap 'handler' => \&sigtrap, 'HUP', 'INT','ABRT','QUIT','TERM';
while(1){
print "Working...\n";
sleep(2);
}
sub sigtrap(){
print "Caught a signal\n";
exit(1);
}
While this works well when I actually hit ctrl-c from the command line, if I issue a
kill -9 <pid>
It just dies. How do I get it to execute something before exiting? My general idea is to use this framework to capture when this script dies on a server due to a server reboot for maintenance or failure.
Thanks much in advance
Signal #9 (SIGKILL) can not be trapped. That's how Unix is designed.
But the system does not send that signal when shutting down for maintainance. At least if your daemon behaves correctly. It will normally send the TERM signal (or more exactly what your daemon handling script in /etc/init.d does). Only processes that do not correctly shutdown after a timeout will receive SIGKILL.
So your aim should be to correctly handle the TERM signal and to write the wrapper script in /etc/init.d that will be called when the system is changing runlevel.
Update: You can use the Daemon::Control module for the init script.
You're sending two very different signals to your process. Pressing Ctrl-C in console usually sends the process a TERMINT signal, which - judging by your code - is caught and serviced. kill -9, though, sends signal number 9 explicitly, which is called KILL. This is one of the signals whose servicing cannot be redefined and delivery of this signal always immediately ends the process, which is done by the kernel itself.
As far as I know you can't capture kill -9. Try kill <pid> instead.

Which (untrapped) signals will cause a Perl program to stop executing?

What signals will cause a Perl program to stop running if their %SIG entries are not explicitly set?
The answer is platform dependent. To see the default behavior of each signal on your own system, download the Signals::XSIG module (you don't need to install it) and run the program spike/analyze_default_signal_behavior.pl (with no arguments). Or just download and run the script from here.
Note that some signals cannot be trapped by your program even if you do install a %SIG handler. This is also system dependent but usually includes at least SIGKILL and SIGSTOP.
It is easier to talk about the ones that won't stop your program. On my machine (RHEL), everything but FPE (floating point exception), CHLD (child status change), CONT (continue process), URG (urgent condition on socket), and WINCH (window size change) cause the Perl program to stop executing.
Four of the signals don't cause the program to exit, but temporarily cause the program to stop execution: STOP (stop, unblockable), TSTP (terminal stop), and TTIN (Background read from tty), TTOU (Background write to tty). The program will start running again if it recieves CONT.
From man kill on Debian,
Name Num Action Description
0 0 n/a exit code indicates if a signal may be sent
ALRM 14 exit
HUP 1 exit
INT 2 exit
KILL 9 exit cannot be blocked
PIPE 13 exit
POLL exit
PROF exit
TERM 15 exit
USR1 exit
USR2 exit
VTALRM exit
STKFLT exit might not be implemented
PWR ignore might exit on some systems
WINCH ignore
CHLD ignore
URG ignore
TSTP stop might interact with the shell
TTIN stop might interact with the shell
TTOU stop might interact with the shell
STOP stop cannot be blocked
CONT restart continue if stopped, otherwise ignore
ABRT 6 core
FPE 8 core
ILL 4 core
QUIT 3 core
SEGV 11 core
TRAP 5 core
SYS core might not be implemented
EMT core might not be implemented
BUS core core dump might fail
XCPU core core dump might fail
XFSZ core core dump might fail

how to use a shell script to supervise a program?

I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.
I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.
The following "if" always returns false, what am I doing wrong here???
while true; do
if [ ps -p $pid ]; then
echo "Program running fine"
sleep 10
else
echo "Program being restarted\n"
perl program_name.pl &
sleep 5
read -r pid < "${filename}_pid.txt"
fi
done
Get rid of the square brackets. It should be:
if ps -p $pid; then
The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:
if test ps -p $pid; then
In fact that yields "-bash: [: -p: binary operator expected" when I run it.
Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.
First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.
Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.
added in response to comment:
Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:
I know I should check by process name
and not process id, since another
process could jump in and take the id.
So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which
the "monitored" process fails
I start up my week long emacs process which grabs the same PID
the nanny script continues on blissfully unaware that its dependent has failed
The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:
while true ; do
if perl program_name.pl ; then
echo "program_name terminated normally, restarting"
else
echo "oops program_name died again, restarting"
fi
done
Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.
I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.
enable you to send signals to the supervised process (without knowing it's PID)
check how long a service has been up or down
capturing its output and write it to a log file
Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).
I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)
There are two possible solutions:
Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.
That's what kill -0 $pid is for. It returns success if a process with pid $pid exists.