Which signals should a wrapper script pass along to a subprogram? - perl

If I have a script that is a wrapper for another program (e.g., a daemonizer wrapper or a wrapper for mathematica), it is sometimes useful to trap signals in the wrapper program and pass them along to the subprogram.
For example, here's some Perl code to deal with the INT (interrupt) signal so that if you do ctrl-C after launching the wrapper the subprogram also gets interrupted:
my $subprogram = "foo args";
my $pid = open(F, "$subprogram |") or die "Can't open pipe from $subprogram: $!";
$SIG{INT} = sub { kill('INT', $pid);
close(F);
die "Passed along INT signal and now aborting.\n"; };
print while(<F>);
close(F);
Of [all the possible signals](http://en.wikipedia.org/wiki/Signal_(computing) that a program might handle, which ones should a wrapper script pass along?
Is there anything else that a good wrapper should do?
EDIT: Originally this question was asking how to pass along all possible signals. Thanks to the initial answers I learned that was not the right question.
EDIT: I figured out what threw me for a loop here. Mathematica apparently detaches itself from its parent process. So I have to pass along various termination signals explicitly:
$SIG{INT} = sub { kill('INT', $pid); }; # pass along SIGINT (eg, ctrl-C)
$SIG{TERM} = sub { kill('TERM', $pid); }; # pass along SIGTERM (kill's default)
$SIG{QUIT} = sub { kill('QUIT', $pid); }; # pass along SIGQUIT
$SIG{ABRT} = sub { kill('ABRT', $pid); }; # pass along SIGABRT
$SIG{HUP} = sub { kill('HUP', $pid); }; # pass along SIGHUP
Normally this would not be necessary as child processes automatically get these signals passed along (thanks for the answers that set me straight on this!). So now I'm wondering why (and how) mathematica detaches itself...

There are many signals that would be dangerous to just "pass through" in this way. "man 7 signal" and look at things like SIGPIPE, SIGILL, SIGCHILD, etc. It's very likely that you just don't want to touch those guys.
So, the real question becomes: What is the behavior you're looking for? I bet all you really want is SIGINT & SIGCONT to be passed through to the child. Does the subprogram do anything fancy with other signals, like HUP, USR1, or USR2?
I presume you're really just interested in SIGINT & SIGCONT, and the rest (i.e. SIGSEGV, SIGKILL, etc.) will take care of themselves, as the parent process termination will clean up the child as well.
Oh, and by the way, your example of:
print while(<F>);
Is fine, and if the parent perl process is suspended, it won't continue reading from F, once that pipe fills up, your subprogram will block writing into stdout, which is probably pretty much the behavior you want as well.
For some more interesting thoughts, take a look at "man bash" and the "trap" builtin to see what the shell developers have done to help this problem.

None.
The signals that you might generate at the keyboard are sent to the child process anyway, unless the child takes steps to avoid that (in which case, who are you to interfere).
The other signals that might kill the parent process would normally cause the child to disappear when it next writes to the pipe that was closed when the parent died.
So, unless you have reason to think that the child process mismanages signals, I would not worry about relaying any signals to the child. And if the child mismanages signals, maybe you should fix the child rather than hack the parent.

I really don't see why you would want to do that. For that matter, I don't see why you would want to send signals to the wrapper script.
ETA: Stop and restart signals are sent to the whole process group. If that's what you're after: just make sure your child process is part of that process group (it is by default AFAIK).
ETA2:
If mathematica really detaches itself (which can be done using setsid() and setpgid()), that means it explicitly does not want to receive such signals, so you shouldn't send them: it probably won't handle them anyway.

You can get a list of all signals with keys %SIG. So, to pass them all through, you set handlers to send the signal the child process ID:
for my $signal (keys %SIG) {
$SIG{$signal} = sub { kill($signal, $child_pid); };
}
You'll probably want to localize %SIG.
Not that I'm saying it's a good idea, but that's how to do it.

Related

Perl daemonize with child daemons

I have to employ daemons in my code. I need a control daemon that constantly checks the database for the tasks and supervises child daemons. The control daemon must assign tasks to the child daemons, control tasks, create new children if one of them dies, etc. The child daemons check database for tasks for them (by PID). How should I implement daemons for this purpose?
Daemon is just a code word for "background process that runs a long time". So the answer is 'it depends'. Perl has two major ways of doing multiprocessing:
Threading
You run a subroutine as a thread, in parallel with the main program code. (Which may then just monitor thread states).
The overhead of creating a thread is higher, but it's better suited for 'shared memory' style multiprocessing, e.g. when you're passing significant quantities of data back and forth. There's several libraries that make passing information between threads positively straightforward. Personally I quite like Thread::Queue, Thread::Semaphore and Storable.
In particular - Storable has freeze and thaw which lets you move complex data structures (e.g. objects/hashes) around in queues, which is very useful.
Basic threading example:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/bin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
Storable
When it comes to Storable, this is worth a separate example I think, because it's handy to move data around.
use Storable qw ( freeze thaw );
use MyObject; #home made object.
use Thread::Queue;
my $work_q = Thread::Queue->new();
sub worker_thread {
while ( my $packed_item = $work_q->dequeue ) {
my $object = thaw($packed_item);
$object->run_some_methods();
$object->set_status("processed");
#maybe return $object via 'freeze' and a queue?
}
}
my $thr = threads->create( \&worker_thread );
my $newobject = MyObject->new("some_parameters");
$work_q->enqueue( freeze($newobject) );
$work_q->end();
$thr->join();
Because you're passing the object around within the queue, you're effectively cloning it between threads. So bear in mind that you may need to freeze it and 'return' it somehow once you've done something to it's internal state. But it does mean you can do this asynchronously without needing to arbitrate locking or shared memory. You may also find it useful to be able to 'store' and 'retrieve' and object - this works as you might expect. (Although I daresay you might need to be careful about availability of module versions vs. defined attributes if you're retrieving a stored object)
Forking
Your script clones itself, leaving a 'parent' and 'child' - the child then generally diverges and does something different. This uses the Unix built in fork() which as a result is well optimised and generally very efficient - but because it's low level, means it's difficult to do lots of data transfer. You'll end up some slightly complicated things to do Interprocess communication - IPC. (See perlipc for more detail). It's efficient not least because most fork() implementations do a lazy data copy - memory space for your process is only allocated as it's needed e.g. when it's changed.
It's therefore really good if you want to delegate a lot of tasks that don't require much supervision from the parent. For example - you might fork a web server, because the child is reading files and delivering them to a particular client, and the parent doesn't much care. Or perhaps you would do this if you want to spend a lot of CPU time computing a result, and only pass that result back.
It's also not supported on Windows.
Useful libraries include Parallel::ForkManager
A basic example of 'forking' code looks a bit like this:
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $concurrent_fork_limit = 4;
my $fork_manager = Parallel::ForkManager->new($concurrent_fork_limit);
foreach my $thing ( "fork", "spoon", "knife", "plate" ) {
my $pid = $fork_manager->start;
if ($pid) {
print "$$: Fork made a child with pid $pid\n";
} else {
print "$$: child process started, with a key of $thing ($pid)\n";
}
$fork_manager->finish;
}
$fork_manager->wait_all_children();
Which is right for you?
So it's hard to say without a bit more detail about what you're trying to accomplish. This is why StacKOverflow usually likes to show some working, approaches you've tried, etc.
I would generally say:
if you need to pass data around, use threads. Thread::Queue especially when combined with Storable is very good for it.
if you don't, forks (on Unix) are generally faster/more efficient. (But fast alone isn't usually enough - write understandable stuff first, and aim for speed second. It doesn't matter much usually).
Avoid where possible spawning too many threads - they're fairly intensive on memory and creation overhead. You're far better off using a fixed number in a 'worker thread' style of programming, than repeatedly creating new, short lived threads. (On the other hand - forks are actually very good at this, because they don't copy your whole process).
I would suggest in the scenario you give - you're looking at threads and queues. Your parent process can track child threads via threads -> list() and join or create to keep the right number. And can feed data via a central queue to your worker threads. Or have multiple queues - one per 'child' and use that as a task assignment system.

perl Win32 Signal Handling between perl processes

I have a couple long running perl scripts in windows (strawberry perl) that I'm working on.
The first process is a parent monitoring process. It restarts the child process every 24 hours and will be always running.
The second is the child payment processing script. It is imperative that this process completes whatever it's doing before being shutdown.
It's my understanding that signal handling doesn't work in perl on win32 and that it shouldn't be relied on. Is there some other way that I can handle a signal? Win32::Process::Kill seems to kill the process without letting it safely shut down.
This is the signal handling that I've tried...
#Child
my $interrupted = 0;
$SIG{INT} = sub{$interrupted = 1;};
while(!$interrupted){
#keep doing your thing, man
}
#Parent
my $pid = open2(\*CHLD_OUT,\*CHLD_IN,'C:\\strawberry\\perl\\bin\\perl.exe','process.pl');
kill INT=>$pid;
waitpid($pid,0);
The only other thing I can think of is to open a socket between the two processes and write messages across the socket. But there must be something easier. Anyone know of any module that can do this?
Update
I've started working on creating a "signal" mechanism via IO::Socket::INET and IO::Select by opening a socket. This appears to work and I'm thinking of writing a module that is compatible with AnyEvent. But I'm still interested in an implementation that doesn't require opening a listening port and that doesn't require a server/client relationship. Is it possible to do this by subscribing to and firing custom events in windows?
Hmm, an interesting question. One thing I'd be wondering - how feasible is it to rewrite your code to thread?
When faced with a similar problem I found encapsulating the 'child' process as a thread meant I could better 'manage' it from the parent.
e.g.:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $interrupted : shared;
sub child {
while ( not $interrupted ) {
#loop;
}
}
#main process
while ( 1 ) {
$interrupted = 0;
my $child = threads -> create ( \&child );
sleep 60;
$interrupted = 1;
$child -> join();
sleep ( 3600 );
}
But because you've got the IPCs from threading - you've got Thread::Queue, threads::shared, Thread::Semaphore and - I'm at least fairly sure you can send pseudo 'kill' signals within the script. This is because threads emulates 'kill' signals internally too.
http://www.perlmonks.org/?node_id=557328
Add to your thread:
$SIG{'TERM'} = sub { threads->exit(); };
And then then your 'main' can:
$thr->kill('TERM')->detach();
With ActiveState Perl I use windows native events through the Win32::Event module.
This way you don't need to implement anything fancy and you can even have your script interract with native applications.
I use this module in many applications.
Since it is part of Win32::IPC, which uses native code, it may not be available for Strawberry Perl. If that is the case you could try compiling it from the CPAN sources. It might be worth a try if you have lots of Windows perl-based software.

Perl, how to fetch data from urls in parallel?

I need to fetch some data from many web data providers, who do not expose any service, so I have to write something like this, using for example WWW::Mechanize:
use WWW::Mechanize;
#urls = ('http://www.first.data.provider.com', 'http://www.second.data.provider.com', 'http://www.third.data.provider.com');
%results = {};
foreach my $url (#urls) {
$mech = WWW::Mechanize->new();
$mech->get($url);
$mech->form_number(1);
$mech->set_fields('user' => 'myuser', pass => 'mypass');
$resp = $mech->submit();
$results{$url} = parse($resp->content());
}
consume(%results);
Is there some (possibly simple ;-) way to fetch data to a common %results variable, simultaneously, i.e: in parallel, from all the providers?
threads are to be avoided in Perl. use threads is mostly for
emulating UNIX-style fork on Windows; beyond that, it's pointless.
(If you care, the implementation makes this fact very clear. In perl,
the interpreter is a PerlInterpreter object. The way threads
works is by making a bunch of threads, and then creating a brand-new
PerlInterpreter object in each thread. Threads share absolutely
nothing, even less than child processes do; fork gets you
copy-on-write, but with threads, all the copying is done in Perl
space! Slow!)
If you'd like to do many things concurrently in the same process, the
way to do that in Perl is with an event loop, like
EV,
Event, or
POE, or by using Coro. (You can
also write your code in terms of the AnyEvent API, which will let
you use any event loop. This is what I prefer.) The difference
between the two is how you write your code.
AnyEvent (and EV, Event,
POE, and so on) forces you to write your code in a callback-oriented
style. Instead of control flowing from top to bottom, control is in a
continuation-passing style. Functions don't return values, they call
other functions with their results. This allows you to run many IO
operations in parallel -- when a given IO operation has yielded
results, your function to handle those results will be called. When
another IO operation is complete, that function will be called. And
so on.
The disadvantage of this approach is that you have to rewrite your
code. So there's a module called Coro that gives Perl real
(user-space) threads that will let you write your code top-to-bottom,
but still be non-blocking. (The disadvantage of this is that it
heavily modifies Perl's internals. But it seems to work pretty well.)
So, since we don't want to rewrite
WWW::Mechanize
tonight, we're going to use Coro. Coro comes with a module called
Coro::LWP that will make
all calls to LWP be
non-blocking. It will block the current thread ("coroutine", in Coro
lingo), but it won't block any other threads. That means you can make
a ton of requests all at once, and process the results as they become
available. And Coro will scale better than your network connection;
each coroutine uses just a few k of memory, so it's easy to have tens
of thousands of them around.
With that in mind, let's see some code. Here's a program that starts
three HTTP requests in parallel, and prints the length of each
response. It's similar to what you're doing, minus the actual
processing; but you can just put your code in where we calculate the
length and it will work the same.
We'll start off with the usual Perl script boilerplate:
#!/usr/bin/env perl
use strict;
use warnings;
Then we'll load the Coro-specific modules:
use Coro;
use Coro::LWP;
use EV;
Coro uses an event loop behind the scenes; it will pick one for you if
you want, but we'll just specify EV explicitly. It's the best event
loop.
Then we'll load the modules we need for our work, which is just:
use WWW::Mechanize;
Now we're ready to write our program. First, we need a list of URLs:
my #urls = (
'http://www.google.com/',
'http://www.jrock.us/',
'http://stackoverflow.com/',
);
Then we need a function to spawn a thread and do our work. To make a
new thread on Coro, you call async like async { body; of the
thread; goes here }. This will create a thread, start it, and
continue with the rest of the program.
sub start_thread($) {
my $url = shift;
return async {
say "Starting $url";
my $mech = WWW::Mechanize->new;
$mech->get($url);
printf "Done with $url, %d bytes\n", length $mech->content;
};
}
So here's the meat of our program. We just put our normal LWP program
inside async, and it will be magically non-blocking. get blocks,
but the other coroutines will run while waiting for it to get the data
from the network.
Now we just need to start the threads:
start_thread $_ for #urls;
And finally, we want to start handling events:
EV::loop;
And that's it. When you run this, you'll see some output like:
Starting http://www.google.com/
Starting http://www.jrock.us/
Starting http://stackoverflow.com/
Done with http://www.jrock.us/, 5456 bytes
Done with http://www.google.com/, 9802 bytes
Done with http://stackoverflow.com/, 194555 bytes
As you can see, the requests are made in parallel, and you didn't have
to resort to threads!
Update
You mentioned in your original post that you want to limit the number of HTTP requests that run in parallel. One way to do that is with a semaphore,
Coro::Semaphore in Coro.
A semaphore is like a counter. When you want to use the resource that a semaphore protects, you "down" the semaphore. This decrements the counter and continues running your program. But if the counter is at zero when you try to down the semaphore, your thread/coroutine will go to sleep until it is non-zero. When the count goes up again, your thread will wake up, down the semaphore, and continue. Finally, when you're done using the resource that the semaphore protects, you "up" the semaphore and give other threads the chance to run.
This lets you control access to a shared resource, like "making HTTP requests".
All you need to do is create a semaphore that your HTTP request threads will share:
my $sem = Coro::Semaphore->new(5);
The 5 means "let us call 'down' 5 times before we block", or, in other words, "let there be 5 concurrent HTTP requests".
Before we add any code, let's talk about what can go wrong. Something bad that could happen is a thread "down"-ing the semaphore, but never "up"-ing it when it's done. Then nothing can ever use that resource, and your program will probably end up doing nothing. There are lots of ways this could happen. If you wrote some code like $sem->down; do something; $sem->up, you might feel safe, but what if "do something" throws an exception? Then the semaphore will be left down, and that's bad.
Fortunately, Perl makes it easy to have scope Guard objects, that will automatically run code when the variable holding the object goes out of scope. We can make the code be $sem->up, and then we'll never have to worry about holding a resource when we don't intend to.
Coro::Semaphore integrates the concept of guards, meaning you can say my $guard = $sem->guard, and that will automatically down the semaphore and up it when control flows away from the scope where you called guard.
With that in mind, all we have to do to limit the number of parallel requests is guard the semaphore at the top of our HTTP-using coroutines:
async {
say "Waiting for semaphore";
my $guard = $sem->guard;
say "Starting";
...;
return result;
}
Addressing the comments:
If you don't want your program to live forever, there are a few options. One is to run the event loop in another thread, and then join on each worker thread. This lets you pass results from the thread to the main program, too:
async { EV::loop };
# start all threads
my #running = map { start_thread $_ } #urls;
# wait for each one to return
my #results = map { $_->join } #running;
for my $result (#results) {
say $result->[0], ': ', $result->[1];
}
Your threads can return results like:
sub start_thread($) {
return async {
...;
return [$url, length $mech->content];
}
}
This is one way to collect all your results in a data structure. If you don't want to return things, remember that all the coroutines share state. So you can put:
my %results;
at the top of your program, and have each coroutine update the results:
async {
...;
$results{$url} = 'whatever';
};
When all the coroutines are done running, your hash will be filled with the results. You'll have to join each coroutine to know when the answer is ready, though.
Finally, if you are doing this as part of a web service, you should use a coroutine-aware web server like Corona. This will run each HTTP request in a coroutine, allowing you to handle multiple HTTP requests in parallel, in addition to being able to send HTTP requests in parallel. This will make very good use of memory, CPU, and network resources, and will be pretty easy to maintain!
(You can basically cut-n-paste our program from above into the coroutine that handles the HTTP request; it's fine to create new coroutines and join inside a coroutine.)
Looks like ParallelUserAgent is what you're looking for.
Well, you could create threads to do it--specifically see perldoc perlthrtut and Thread::Queue. So, it might look something like this.
use WWW::Mechanize;
use threads;
use threads::shared;
use Thread::Queue;
my #urls=(#whatever
);
my %results :shared;
my $queue=Thread::Queue->new();
foreach(#urls)
{
$queue->enqueue($_);
}
my #threads=();
my $num_threads=16; #Or whatever...a pre-specified number of threads.
foreach(1..$num_threads)
{
push #threads,threads->create(\&mechanize);
}
foreach(#threads)
{
$queue->enqueue(undef);
}
foreach(#threads)
{
$_->join();
}
consume(\%results);
sub mechanize
{
while(my $url=$queue->dequeue)
{
my $mech=WWW::Mechanize->new();
$mech->get($url);
$mech->form_number(1);
$mech->set_fields('user' => 'myuser', pass => 'mypass');
$resp = $mech->submit();
$results{$url} = parse($resp->content());
}
}
Note that since you're storing your results in a hash (instead of writing stuff to a file), you shouldn't need any kind of locking unless there's a danger of overwriting values. In which case, you'll want to lock %results by replacing
$results{$url} = parse($resp->content());
with
{
lock(%results);
$results{$url} = parse($resp->content());
}
Try https://metacpan.org/module/Parallel::Iterator -- saw a very good presentation about it last week, and one of the examples was parallel retrieval of URLs -- it's also covered in the pod example. It's simpler than using threads manually (although it uses fork underneath).
As far as I can tell, you'd still be able to use WWW::Mechanize, but avoid messing about with memory sharing between threads. It's a higher-level model for this task, and might be a little simpler, leaving the main logic of #Jack Maney's mechanize routine intact.

Why does not catalyst die only once in a chained action?

Consider the following actions:
sub get_stuff :Chained('/') :PathPart('stuff') :CaptureArgs(1) {
my ($self,$c,$stuff_id) = #_;
die "ARRRRRRGGGG";
}
sub view_stuff :Chained('get_stuff') :PathPart('') :Args(0){
die "DO'H";
}
Now if you request '/stuff/314/' , you'll get
Error: ARRRRG in get_stuff at ...
Error: DO'H in view_stuff at ...
Is there a reason why not just throw the error at the first failing chain link?
Why is catalyst trying to carry on the chain?
I'm not sure of the answer as to 'why' but I presume it was done that way to give flexibility.
You should probably catch the error with eval (or preferably something like Try::Tiny or TryCatch) and call $c->detach if you want to stop processing actions.
Catalyst::Plugin::MortalForward
The chosen answer may be outdated. Catalyst can die early, when the application's config key abort_chain_on_error_fix is set.
__PACKAGE__->config(abort_chain_on_error_fix => 1);
See the documentation about Catalyst configurations. It also states, that this behaviour may be standard in future.
Cucabit is right, detach is the way to go. As to the why, normally in a perl process, 'die' stops the process. In Catalyst you don't want that. If for instance you run your Catalyst app under FastCGI, you spawn one or more standalone processes that handle multiple requests. If the first request would kill the process itself, the web server would have to respawn the FastCGI process in order to be able to handle the next call. I think for that, Catalyst catches 'die' (its used a lot as the default 'do_something() or die $!') and turns it into an Exception.
You could also end the process with 'exit' I guess, but you end up with the same problems as above, killing the process.
What you can of course do is create your own 'die' method that logs the error passed with the default log object and then calls detach or something. it should also be possible to redefine the Catalyst exception handling as anything is possible with Catalyst :)

Is there a way to have managed processes in Perl (i.e. a threads replacement that actually works)?

I have a multithreded application in perl for which I have to rely on several non-thread safe modules, so I have been using fork()ed processes with kill() signals as a message passing interface.
The problem is that the signal handlers are a bit erratic (to say the least) and often end up with processes that get killed in inapropriate states.
Is there a better way to do this?
Depending on exactly what your program needs to do, you might consider using POE, which is a Perl framework for multi-threaded applications with user-space threads. It's complex, but elegant and powerful and can help you avoid non-thread-safe modules by confining activity to a single Perl interpreter thread.
Helpful resources to get started:
Programming POE presentation by Matt Sergeant (start here to understand what it is and does)
POE project page (lots of cookbook examples)
Plus there are hundreds of pre-built POE components you can use to assemble into an application.
You can always have a pipe between parent and child to pass messages back and forth.
pipe my $reader, my $writer;
my $pid = fork();
if ( $pid == 0 ) {
close $reader;
...
}
else {
close $writer;
my $msg_from_child = <$reader>;
....
}
Not a very comfortable way of programming, but it shouldn't be 'erratic'.
Have a look at forks.pm, a "drop-in replacement for Perl threads using fork()" which makes for much more sensible memory usage (but don't use it on Win32). It will allow you to declare "shared" variables and then it automatically passes changes made to such variables between the processes (similar to how threads.pm does things).
From perl 5.8 onwards you should be looking at the core threads module. Have a look at http://metacpan.org/pod/threads
If you want to use modules which aren't thread safe you can usually load them with a require and import inside the thread entry point.