Why does a program with Parallel::Loops exhaust my memory? - perl

I've inherited some code at work i'm trying to improve on. My Perl skills are somewhat lacking so would love some assistance!
Essentially this script is SNMP polling a network of thousands of nodes to update it's local interface index cache. I've found it's hitting a problem where it's exhausting it's memory and failing. Code as follows (heavily reduced but i think you'll get the jist)
use strict;
use warnings;
use Parallel::Loops;
my %snmp_results;
my $maxProcs = 50;
my #exceptions;
my #devices;
my %snmp_results;
my $pl = Parallel::Loops->new($maxProcs);
$pl->share(\%snmp_results, \#exceptions );
load_devices();
get_snmp_interfaces();
sub get_snmp_interfaces {
$pl->foreach( \#devices, sub {
my ($name, $community, $snmp_ver) = #$_;
# Create the new ifindex cache, and return an array reference to the new entries
my $result = getSNMPIFFull($name, $community, $snmp_ver);
if (defined $result && $result ne "") {
my %cache = %{$result};
print "Got cache for $name\n";
# Build hash of all the links polled through SNMP
# [ifindex, ifdesc, ifalias, ifspeed, ip]
for my $link (keys %cache) {
$snmp_results{$name}{$cache{$link}[0]} = [$cache{$link}[0], $cache{$link}[1], $cache{$link}[2], $cache{$link}[3], $cache{$link}[4]];
}
}
else {
push(#exceptions, "Unable to poll $name - $community - $snmp_ver");
}
});
}
This particular VM has 3.1GB of ram alloctable and is idling on about 83MB usage when this script is not running. If i drop the maxProcs down to 25, it will finish fine but this script can already take a long time given the sheer number of devices + latency so would rather keep the parallelism high!
I have a feeling that the $pl->share() is sharing the ever-expanding %snmp_results with each forked process which is definitely not necessary since it's not reading/modifying other entries: just adding new entries. Is there a better way I can be doing this?
I'm also slightly unsure about my %cache = %{$result};. If this is just creating a pointer as a hash then cool but if it's doing a copy, that's also a bit wasteful!
Any help will be greatly appreciated!

Documentation of the module can be found in the CPAN here.
There's one part talking about the performance:
Also, if each loop sub returns a massive amount of data, this needs to
be communicated back to the parent process, and again that could
outweigh parallel performance gains unless the loop body does some
heavy work too.
You are probably moving around complete copies of the variables in memory, pushing to the machine's limit if the MIB to poll and number of machines are big enough.
Since what you are doing is an I/O intensive task and not a CPU task that could benefit of parallel CPU processing, I would reconsider the approach of launching so many (50!) threads for polling.
Run the program with $maxProcs down to 1 to 5 processes and see how it behaves. Do some profiling of your code, attaching Devel::NYTProf to check where you are consuming time and if increasing the number of processes actually leads to a better performance.
Reconsider using Parallel::Loops for this task. You may get better performance with use threads[1] and a hash shared between the different threads (use threads::shared).
Apologies if this could have been a comment. Starting in SO is difficult due to all the limitations that are in place :(
If you already found a solution it would be great if you could share with us your findings. I didn't know Parallel::Loops before and I think I can give it some use.

Related

Add parallelism to perl script

I have small perl script which gets services details from mongoDB, queries its statuses and gives html output
#...some stuff to get $token
my #cmd = ('/opt/mongo/bin/mongo', '127.0.0.1:27117/service_discovery', '--quiet', '-u', 'xxx', '-p', 'xxx', '--eval', "var environ='$env'; var action='status'", '/home/mongod/www/cgi/getstatus.js');
my $mongo_out;
run \#cmd, '>>', \$mongo_out;
$json->incr_parse ($mongo_out);
while (my $obj = $json->incr_parse) {
my $hostname = "$obj->{'hostname'}";
print "<tr><td colspan=4 align=\"center\"><h4>$hostname</h4></td></tr>";
foreach my $service (#{$obj->{'services'}}) {
my $name = "$service->{'name'}";
my $port = "$service->{'port'}";
my $proto = "$service->{'proto'}";
my $request = HTTP::Request->new(GET => "${proto}://$hostname:${port}/status/service");
$request->header(Authorization => "Bearer $token");
my $ua = LWP::UserAgent->new;
$ua->timeout(2);
my $response = $ua->request($request);
my $code = $response->code();
if ($code == 200) {
my $var = %$response->{'_content'};
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
my $out = try {my $output = $coder->decode($var)} catch {undef};
if(exists $out->{'name'} && exists $out->{'version'}) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td align=\"center\">$out->{'name'}</td><td align=\"center\">$out->{'version'}</td></tr>";
} else {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">auth failed</td></tr>";
}
} elsif ($code == 500) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">offline</td></tr>";
} elsif ($code == 404) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">page not found</td></tr>";
}
}
}
It executes for a while, especially when some services are offline. Is it possible to query services within same host simultaneously?
This is almost a question that's too broad to answer, because ... it depends.
But yes. You have two and a half mechanism for parallelising in perl:
thread
fork
Non blocking IO.
I say two and a half, because non-blocking IO isn't really parallel, as much as solving the same problem a different way.
Implementation of parallelism is a really good way to end up with some horrific and hard to trace bugs, and requires a bit of a shift of mind set, because your code is no longer executing in a well defined sequence - the whole point is that your code might hit different bits at different times, and that can cause utter chaos.
And not least because modules you import - might well not be "thread safe" (which means they may be fine, but occasionally will break in a very unpredictable way, and you'll tear your hair out trying to track down the bug).
So with that in mind
threads
Perhaps slightly counter intuitively, if you've used threads in another language - perl threads are NOT light weight. There is a significant cost to starting them, not least because you effectively end up multiplying your memory footprint by the number of threads you are running.
I would normally suggest as a result - look at a "worker threads" model, using Thread::Queue. You start up a number of threads, and use queues to serialise the input and output from the threads.
forking
fork() is a unix native system call. You use it a lot, and it's quite efficient. It splits your program into two identical copies - including position within the code - at the point at which it's called. The only difference initially is the return code of the fork() system call - the parent will get the process ID of the child, the child will get zero.
It's quite easy to do strange thing accidentally, as both piece of code at this point are at exactly the same point in terms of loop iterations, file handles, etc. but this rapidly diverges and you can again, end up with some very strange things happening if you interact with 'shared' resources.
I would normally suggest looking at Parallel::ForkManager module as an easy way to avoid tripping yourself up with fork().
non blocking IO
You can often use something like IO::Select and the can_read method, which detects which file handles will block if you read from them - you can skip that one, until it blocks. This would also work for your use case, although it's not always applicable.
I've got examples of both the above here: Perl daemonize with child daemons

parallely execution of perl language

I have a perl program that read the packets of a flow from a pcap file, but it takes a lot of time,I want to make it parallel,but I don't know is it possible or not?if yes can I do it with MPI?and another question, the best way for making this code parallel,here is the piece of my code ( I think I should work on this part for paralleling, but I don't know the best way!)
while (!eof($inFileH))
{
#inFileH is the handler of the pcap file
#in each while I read one packet
$ts_sec = readBytes($inFileH,4);
$ts_usec = readBytes($inFileH,4);
$incl_len = readBytes($inFileH,4);
$orig_len = readBytes($inFileH,4);
if ($totalLen == 0) # it is the 1st packet
{
$startTime = $ts_sec + $ts_usec/1000000;
}
$timeStamp = $ts_sec + $ts_usec/1000000 - $startTime;
$totalLen += $orig_len;
$#packet = -1; n # initing the array
for (my $i=0 ; $i<$incl_len ; $i++) #read all included octects of the current packet
{
read $inFileH, $packet[$i], 1;
$packet[$i] = ord($packet[$i]);
}
#and after that I will work on the "packet" and analyze it
so how should I send the file content for other processors to work on it in parallel.....
First you need to determine the bottleneck. If it is really CPU usage (i.e. CPU usage is at 100% while you are running the script), you need to figure out where the processing spends its time.
This may well be in the way that you are parsing the input. There may be obvious ways to speed this up. For instance, if you use complex regular expressions, and focus exclusively on matching input correctly, there may be ways to make the matching a lot faster by rewriting the expressions or doing simpler matches before trying more complex ones.
If you can't reduce CPU usage far enough in this way, and you really want to parallelize, see if you can employ the mechanism with which Perl was born: Unix pipes. You can write Perl scripts that pass data through to each other in a pipeline, or you can do the creation of the processes and pipes within Perl itself (see perlopentut, and if that isn't enough, perlipc).
As a general rule, I would consider these options first before trying other mechanisms, but it really depends of the details of what you're trying to do and the context in which you need to do it.

Should I re-use a single HTML::SimpleLinkExtor object for memory efficiency?

So this may seem like a silly question, but I'm building an application where memory is a very limited resource so I need to be as cautious about memory usage as I can. So my question is, which of the following is more memory efficient?
while(<LINKS_FILE>) {
my $extor = HTML::SimpleLinkExtor->new($resp->base); #$resp from above somewhere
$extor->parse($_);
my #links = $extor->links;
for my $link (#links) { print "$link\n" }
}
or
my $extor = HTML::SimpleLinkExtor->new($resp->base); #$resp from above somewhere
while(<LINKS_FILE>) {
$extor->parse($_);
my #links = $extor->links;
for my $link (#links) { print "$link\n" }
$extor->clear_links;
}
So in the first it creates a new HTML::SimleLinkExtor object every time, whereas in the second it just kind of resets the same one for use again. So it seems to me like the second one would be more memory efficient, but to be honest I don't really know how good perl is about releasing memory back to the os, or if it's gonna hold on to the memory for some of the HTML::SimpleLinkExtor objects even after they're out of scope. Thanks for the help!
I am not inclined to spend time profiling, but if I were in your situation, I would try HTML::LinkExtor first. If you provide a callback, it will not save the links it finds internally, reducing the footprint of your application. You can then decide whether to store the links, or maybe write to an external file, to keep memory use to a minimum:
use HTML::LinkExtor;
my $parser = HTML::LinkExtor->new(sub {
my($tag, %links) = #_;
print "$tag #{[%links]}\n";
});
$parser->parse_file("index.html");

Caching & avoiding Cache Stampedes - multiple simultaneous calculations

We have a very expensive calculation that we'd like to cache. So we do something similar to:
my $result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now, during calculate($key), before we store the result in the cache, several other requests come in, that also start running calculate($key), and system performance suffers because many processes are all calculating the same thing.
Idea: Lets put a flag in the cache that a value is being calculated, so the other requests just wait for that one calculation to finish, so they all use it. Something like:
my $result = $cache->get( $key );
if ($result) {
while ($result =~ /Wait, \d+ is running calculate../) {
sleep 0.5;
$result = $cache->get( $key );
}
} else {
$cache->set( $key, "Wait, $$ is running calculate()", '10 minutes' );
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now that opens up a whole new can of worms. What if $$ dies before it sets the cache. What if, what if... All of them solvable, but since there is nothing in CPAN that does this (there is something in CPAN for everything), I start wondering:
Is there a better approach? Is there a particular reason e.g. Perl's Cache and Cache::Cache classes don't provide some mechanism like this? Is there a tried and true pattern I could use instead?
Ideal would be a CPAN module with a debian package already in squeeze or a eureka moment, where I see the error of my ways... :-)
EDIT: I have since learned that this is called a Cache stampede and have updated the question's title.
flock() it.
Since your worker processes are all on the same system, you can probably use good, old-fashioned file locking to serialize the expensive calculate()ions. As a bonus, this technique appears in several of the core docs.
use Fcntl qw(:DEFAULT :flock); # warning: this code not tested
use constant LOCKFILE => 'you/customize/this/please';
my $result = $cache->get( $key );
unless ($result) {
# Get an exclusive lock
my $lock;
sysopen($lock, LOCKFILE, O_WRONLY|O_CREAT) or die;
flock($lock, LOCK_EX) or die;
# Did someone update the cache while we were waiting?
$result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
# Exclusive lock released here as $lock goes out of scope
}
return $result;
Benefit: worker death will instantly release the $lock.
Risk: LOCK_EX can block forever, and that is a long time. Avoid SIGSTOPs, perhaps get comfortable with alarm().
Extension: if you don't want to serialize all calculate() calls, but merely all calls for the same $key or some set of keys, your workers can flock() /some/lockfile.$key_or_a_hash_of_the_key.
Use lock? Or maybe that would be an overkill? Or if it is possible, precalculate the result offline then use it online?
Although it may (or may not) be overkill for your use case, have you considered using a message queue for the processing? RabbitMQ seems to be a popular choice in the Perl community at the moment and it is supported through the AnyEvent::RabbitMQ module.
The basic strategy in this case would be to submit a request to the message queue whenever you need to calculate a new key. The queue could then be set to calculate only a single key at a time (in the order requested) if that's all you can reliably handle. Alternately, if you can safely compute multiple keys concurrently, the queue can also be used to consolidate multiple requests for the same key, computing it once and returning the result to all clients who requested that key.
Of course, this would add a bit of complexity and AnyEvent calls for a somewhat different programming style than you may be used to (I would offer an example, but I've never really gotten the hang of it myself), but it may offer sufficient gains in efficiency and reliability to make those costs worth your while.
I agree generally with pilcrow's approach above. I would add one thing to it: Investigate the use of the memoize() function to potentially speed up the calculate() operation in your code.
See http://perldoc.perl.org/Memoize.html for details

Perl IPC - FIFO and daemons & CPU Usage

I have a mail parser perl script which is called every time a mail arrives for a user (using .qmail). It extracts a calendar attachment out of the mail and places the "path" of the file in a FIFO queue implemented using the Directory::Queue module.
Another perl script which reads the path of the calendar attachment and performs certain file operations on the local system as well as on the remote CalDAV server, is being run as a daemon, as explained here. So basically this script looks like:
my $declarations
sub foo {
.
.
}
sub bar {
.
.
}
while ($keep_running) {
for(keep-checking-the-queue-for-new-entries) {
sub caldav_logic1 {
.
.
}
sub caldav_logic2 {
.
.
}
}
}
I am using Proc::Daemon for running the script as a daemon. Now the problem is, this process has almost 100% CPU usage. What are the suggested ways to implement the daemon in a more standard, safer way ? I am using pretty much the same code as mentioned in the link mentioned for usage of Proc::Daemon.
I bet it is your for loop and checking for new queue entries.
There are ways to watch a directory for file changes. These ways are OS dependent but there might be a Perl module that wraps them up for you. Use that instead of busy looping. Even with a sleep delay, the looping is inefficient when you can have your program told exactly when to wake up by an OS event.
File::ChangeNotify looks promising.
Maybe you don't want truly continuous polling. Is keep-checking-the-queue-for-new-entries a CPU-intensive part of the code, even when the queue is empty? That would explain why your processor is always busy.
Try putting a sleep 1 statement at the very top (or very bottom) of the while loop to let the processor rest between queue checks. If that doesn't degrade the program performance too much (i.e., if everyone can tolerate waiting an extra second before the company calendars get updated) and if the CPU usage still seems high, try sleep 2, sleep 5, etc.
cpan Linux::Inotify2
The kernel knows when files change and sends this information to your program which runs the sub. Maybe this will be better because the program will run the sub only when the file is changed.