How do I disable Devel::Cover for forked child processes? - perl

I noticed, that when I run my program with perl -MDevel::Cover=-silent,-nogcov foo.pl to collect coverage information for foo.pl, I am getting massive slowdowns from parts of my program that fork and exec non-perl programs like tar, gzip or dpkg-deb. Thanks to this question I figured out how to disable Devel::Cover selectively, so I'm now writing:
my $is_covering = !!(eval 'Devel::Cover::get_coverage()');
my $pid = fork();
if ($pid == 0) {
eval 'Devel::Cover::set_coverage("none")' if $is_covering;
exec 'tar', '-cf', ...
}
Doing so, shaves off five minutes of runtime per test which, for 122 tests saves me 10 hours of computation time.
Unfortunately, I cannot always add this eval statement into the forked child process. For example it's impossible to do so when I use system(). I want to avoid rewriting each of my system() calls to a manual fork/exec.
Is there a way to disable Devel::Cover for my forked processes or basically for everything that is not my script foo.pl?
Thanks!

Forks::Super is kind of heavy, but it has the feature of post-fork callbacks that are executed after each fork but before any other code in a child process is executed.
use Forks::Super;
my $is_covering = !!(eval 'Devel::Cover::get_coverage()');
POSTFORK_CHILD {
# runs in every child process immediately after fork()
eval 'Devel::Cover::set_coverage("none")' if $is_covering;
};
...

I suspect your problem is not the fork per se, but rather the exec. The difference is somewhat academic but might lead to a possible solution. If you don't mind compiling your own version of Devel::Cover you could try commenting out this line: https://github.com/pjcj/Devel--Cover/blob/05392f3062dd2bdbf019d9a8fbae1b152b97d862/Cover.xs#L1140
This will cause any coverage data collected before an exec call to be lost and speed up the exec call.
If you can't compile your own version, adding local *Devel::Cover::_report = sub { }; before the exec calls should also speed up the execs but this is ultimately a similar solution to what you have already with the disadvantage of not using a published API.

Related

perl Win32 Signal Handling between perl processes

I have a couple long running perl scripts in windows (strawberry perl) that I'm working on.
The first process is a parent monitoring process. It restarts the child process every 24 hours and will be always running.
The second is the child payment processing script. It is imperative that this process completes whatever it's doing before being shutdown.
It's my understanding that signal handling doesn't work in perl on win32 and that it shouldn't be relied on. Is there some other way that I can handle a signal? Win32::Process::Kill seems to kill the process without letting it safely shut down.
This is the signal handling that I've tried...
#Child
my $interrupted = 0;
$SIG{INT} = sub{$interrupted = 1;};
while(!$interrupted){
#keep doing your thing, man
}
#Parent
my $pid = open2(\*CHLD_OUT,\*CHLD_IN,'C:\\strawberry\\perl\\bin\\perl.exe','process.pl');
kill INT=>$pid;
waitpid($pid,0);
The only other thing I can think of is to open a socket between the two processes and write messages across the socket. But there must be something easier. Anyone know of any module that can do this?
Update
I've started working on creating a "signal" mechanism via IO::Socket::INET and IO::Select by opening a socket. This appears to work and I'm thinking of writing a module that is compatible with AnyEvent. But I'm still interested in an implementation that doesn't require opening a listening port and that doesn't require a server/client relationship. Is it possible to do this by subscribing to and firing custom events in windows?
Hmm, an interesting question. One thing I'd be wondering - how feasible is it to rewrite your code to thread?
When faced with a similar problem I found encapsulating the 'child' process as a thread meant I could better 'manage' it from the parent.
e.g.:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $interrupted : shared;
sub child {
while ( not $interrupted ) {
#loop;
}
}
#main process
while ( 1 ) {
$interrupted = 0;
my $child = threads -> create ( \&child );
sleep 60;
$interrupted = 1;
$child -> join();
sleep ( 3600 );
}
But because you've got the IPCs from threading - you've got Thread::Queue, threads::shared, Thread::Semaphore and - I'm at least fairly sure you can send pseudo 'kill' signals within the script. This is because threads emulates 'kill' signals internally too.
http://www.perlmonks.org/?node_id=557328
Add to your thread:
$SIG{'TERM'} = sub { threads->exit(); };
And then then your 'main' can:
$thr->kill('TERM')->detach();
With ActiveState Perl I use windows native events through the Win32::Event module.
This way you don't need to implement anything fancy and you can even have your script interract with native applications.
I use this module in many applications.
Since it is part of Win32::IPC, which uses native code, it may not be available for Strawberry Perl. If that is the case you could try compiling it from the CPAN sources. It might be worth a try if you have lots of Windows perl-based software.

What are the Perl techniques to detach just a portion of code to run independently?

I'm not involved in close-to-OS programming techniques, but as I know, when it comes to doing something in parallel in Perl the weapon of choice is fork and probably some useful modules built upon it. The doc page for fork says:
Does a fork(2) system call to create a new process running the same program at the same point.
As a consequence, having a big application that consumes a lot of memory and calling fork for a small task means there will be 2 big perl processes, and the second will waste resources just to do some simple work.
So, the question is: what to do (or how to use fork, if it's the only method) in order to have a detached portion of code running independently and consuming just the resources it needs?
Just a very simpel example:
use strict;
use warnings;
my #big_array = ( 1 .. 2000000 ); # at least 80 MB memory
sleep 10; # to have time to inspect easely the memory usage
fork();
sleep 10; # to have time to inspect easely the memory usage
and the child process consumes 80+ MB too.
To be clear: it's not important to communicate to this detached code or to use its result somehow, just to be possible to say "hey, run for me this simple task in the background and let me continue my heavy work meanwhile ... and don't waste my resources!" when running a heavy perl application.
fork() to exec() is your bunny here. You fork() to create a new process (which is a fairly cheap operation, see below), then exec() to replace the big perl you've got running with something smaller. This looks like this:
use strict;
use warnings;
use 5.010;
my #ary = (1 .. 10_000_000);
if (my $pid = fork()) {
# parent
say "Forked $pid from $$; sleeping";
sleep 1_000;
} else {
# child
exec('perl -e sleep 1_000');
}
(#ary was just used to fill up the original process' memory a bit.)
I said that fork()ing was relatively cheap, even though it does copy the entire original process. These statements are not in conflict; the guys who designed fork noticed this same problem. The copy is lazy, that is, only the bits that are actually changed are copied.
If you find you want the processes to talk to each other, you'll start getting into the more complex domain of IPC, about which a number of books have been written.
Your forked process is not actually using 80MB of resident memory. A large portion of that memory will be shared - 'borrowed' from the parent process until either the parent or child writes to it, at which point copy-on-write semantics will cause the memory to actually be copied.
If you want to drop that baggage completely, run exec in your fork. That will replace the child Perl process with a different executable, thus freeing the memory. It's also perfect if you don't need to communicate anything back to the parent.
There is no way to fork just a subset of your process's footprint, so the usual workarounds come down to:
fork before you run memory intensive code in the parent process
start a separate process with system or open HANDLE,'|-',.... Of course this new process won't inherit any data from its parent, so you will need to pass data to this child somehow.
fork() as implemented on most operating systems is nicely efficient. It commonly uses a technique called copy-on-write, to mean that pages are initially shared until one or other process writes to them. Also a lot of your process memory is going to be readonly mapped files anyway.
Just because one process uses 80MB before fork() doesn't mean that afterwards the two will use 160. To start with it will be only a tiny fraction more than 80MB, until each process starts dirtying more pages.

perl how to mock thread behavior

I want to write unit tests for a subroutine in perl. The subroutine is using multiple threads to do its tasks. So, it first creates some threads and then it waits for them to join.
The problem is that our unit tests run on a server which is not able to run multi-threaded tests, so I need to somehow mock out the thread behavior. Basically I want to override the threads create and join functions such that its not threaded anymore. Any pointers how can I do that and test the code ?
Edit : The server fails to run the threaded code for the following reason:
Devel::Cover does not yet work with threads
Update: this answer doesn't solve the OP's problem as described in the edited question, but it might be useful to someone.
Perl threads are an interpreter emulation, not an operating system feature. So, they should work on any platform. If your testing server doesn't support threads, it's probably for one of these reasons:
Your version of Perl is very old.
Perl was compiled without thread support.
Your testing framework wasn't created with threaded code in mind.
The first two could be easily rectified by updating your environment. However, I suspect yours is the third issue.
I don't think you should solve this by mocking the thread behavior. This changes the original code too much to be a valid test. And it would be a significant amount of work anyway, so why not direct that effort toward getting a threaded test working?
The exact issues depend on your code, but probably the issue is that your subroutine starts a thread and then returns, with the thread still running. Then your test framework runs the sub over and over, accumulating a whole bunch of concurrent threads.
In that case, all you need is a wrapper sub that calls the sub you are testing, and then blocks until the threads are complete. This should be fairly simple. Take a look at threads->list() to see how you can detect running threads. Just have a loop that waits until the threads in question are no longer running before exiting the wrapper sub.
Here is a simple complete example demonstrating a wrapper sub:
#!usr/bin/perl
use strict;
use warnings;
use threads;
sub sub_to_test {
threads->create(sub { sleep 5; print("Thread done\n"); threads->detach() });
return "Sub done\n";
}
sub wrapper {
#Get a count of the running threads.
my $original_running_threads = threads->list(threads::running);
my #results = sub_to_test(#_);
#block until the number of running threads is the same as when we started.
sleep 1 while (threads->list(threads::running) > $original_running_threads);
return #results;
}
print wrapper;

Remove Perl module from child stack

I have a daemon which loads DBI (DBD::mysql) and then forks child processes. I'd like to prevent the DBI module from being in memory in the forked child processes.
So something like this:
#!/usr/bin/perl
use DBI;
my $dbh = DBI->connect(db_info);
my $pid = fork();
if($pid){
# The forked process here should not have DBI loaded
}
Thanks for the help!
Loading a module is to execute it like a script. There's absolutely no difference between a module and a script to Perl. To unload a module, one would need to undo the effects of running it. That can't be done mechanically, and it's not feasible to do manually.
The simplest solution would to be to have the child exec something. It could even be the script you are already running.
exec($^X, $0, '--child', #args)
The child can be given access to the socket by binding it to the child's fd 0 (stdin) and fd 1 (stdout).
You can't do that easily unless you put the load after the fork. But to do that you have to not use use. Do this instead:
my $pid = fork();
if ($pid) {
# child
} else {
require DBI;
import DBI;
}
That should prevent the DBI module from loading until after the fork. The use routine essentially does a require/import but inside a BEGIN {} block which is why you have to not use it.
If you are running a modern Linux system, then forks are COW (copy on write). This means pages from the parent are only copied to the child's address space if they are modified by the parent or the child. So, the DBI module is not in the memory of the forked child processes.
Perl 5 does not have any way of unloading modules from memory. If you really need the children to have different code than the parent for some reason, you are better off separating that code out of the main code as its own script and then using exec after the fork to run the child script. This will be slower than normal forking since it has to compile the child code, so if you fork a lot, it might be better to have two scripts that talk to each other over sockets and have the "child" script pre-fork.
Knowing now what you want to do with this, since there isn't a good way to unload modules i Perl, a good solution to the problem as to write an authentication server separate from the application server. The application server asks the authentication server if an IP has permissions. That way they remain in wholly separate processes. This might also have security benefits, your application code can't access your authentication database.
Since any given application is likely to expand to the point where it needs a SQL database of its own, this exercise is probably futile, but your call.
This is a bunch of extra work and maintenance and complexity. It's only worth while if it's causing you real memory problems, not just because it's bugs you. Remember, RAM is very cheap. Developer time is very expensive.

Is there a way to have managed processes in Perl (i.e. a threads replacement that actually works)?

I have a multithreded application in perl for which I have to rely on several non-thread safe modules, so I have been using fork()ed processes with kill() signals as a message passing interface.
The problem is that the signal handlers are a bit erratic (to say the least) and often end up with processes that get killed in inapropriate states.
Is there a better way to do this?
Depending on exactly what your program needs to do, you might consider using POE, which is a Perl framework for multi-threaded applications with user-space threads. It's complex, but elegant and powerful and can help you avoid non-thread-safe modules by confining activity to a single Perl interpreter thread.
Helpful resources to get started:
Programming POE presentation by Matt Sergeant (start here to understand what it is and does)
POE project page (lots of cookbook examples)
Plus there are hundreds of pre-built POE components you can use to assemble into an application.
You can always have a pipe between parent and child to pass messages back and forth.
pipe my $reader, my $writer;
my $pid = fork();
if ( $pid == 0 ) {
close $reader;
...
}
else {
close $writer;
my $msg_from_child = <$reader>;
....
}
Not a very comfortable way of programming, but it shouldn't be 'erratic'.
Have a look at forks.pm, a "drop-in replacement for Perl threads using fork()" which makes for much more sensible memory usage (but don't use it on Win32). It will allow you to declare "shared" variables and then it automatically passes changes made to such variables between the processes (similar to how threads.pm does things).
From perl 5.8 onwards you should be looking at the core threads module. Have a look at http://metacpan.org/pod/threads
If you want to use modules which aren't thread safe you can usually load them with a require and import inside the thread entry point.