Caching & avoiding Cache Stampedes - multiple simultaneous calculations - perl

We have a very expensive calculation that we'd like to cache. So we do something similar to:
my $result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now, during calculate($key), before we store the result in the cache, several other requests come in, that also start running calculate($key), and system performance suffers because many processes are all calculating the same thing.
Idea: Lets put a flag in the cache that a value is being calculated, so the other requests just wait for that one calculation to finish, so they all use it. Something like:
my $result = $cache->get( $key );
if ($result) {
while ($result =~ /Wait, \d+ is running calculate../) {
sleep 0.5;
$result = $cache->get( $key );
}
} else {
$cache->set( $key, "Wait, $$ is running calculate()", '10 minutes' );
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now that opens up a whole new can of worms. What if $$ dies before it sets the cache. What if, what if... All of them solvable, but since there is nothing in CPAN that does this (there is something in CPAN for everything), I start wondering:
Is there a better approach? Is there a particular reason e.g. Perl's Cache and Cache::Cache classes don't provide some mechanism like this? Is there a tried and true pattern I could use instead?
Ideal would be a CPAN module with a debian package already in squeeze or a eureka moment, where I see the error of my ways... :-)
EDIT: I have since learned that this is called a Cache stampede and have updated the question's title.

flock() it.
Since your worker processes are all on the same system, you can probably use good, old-fashioned file locking to serialize the expensive calculate()ions. As a bonus, this technique appears in several of the core docs.
use Fcntl qw(:DEFAULT :flock); # warning: this code not tested
use constant LOCKFILE => 'you/customize/this/please';
my $result = $cache->get( $key );
unless ($result) {
# Get an exclusive lock
my $lock;
sysopen($lock, LOCKFILE, O_WRONLY|O_CREAT) or die;
flock($lock, LOCK_EX) or die;
# Did someone update the cache while we were waiting?
$result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
# Exclusive lock released here as $lock goes out of scope
}
return $result;
Benefit: worker death will instantly release the $lock.
Risk: LOCK_EX can block forever, and that is a long time. Avoid SIGSTOPs, perhaps get comfortable with alarm().
Extension: if you don't want to serialize all calculate() calls, but merely all calls for the same $key or some set of keys, your workers can flock() /some/lockfile.$key_or_a_hash_of_the_key.

Use lock? Or maybe that would be an overkill? Or if it is possible, precalculate the result offline then use it online?

Although it may (or may not) be overkill for your use case, have you considered using a message queue for the processing? RabbitMQ seems to be a popular choice in the Perl community at the moment and it is supported through the AnyEvent::RabbitMQ module.
The basic strategy in this case would be to submit a request to the message queue whenever you need to calculate a new key. The queue could then be set to calculate only a single key at a time (in the order requested) if that's all you can reliably handle. Alternately, if you can safely compute multiple keys concurrently, the queue can also be used to consolidate multiple requests for the same key, computing it once and returning the result to all clients who requested that key.
Of course, this would add a bit of complexity and AnyEvent calls for a somewhat different programming style than you may be used to (I would offer an example, but I've never really gotten the hang of it myself), but it may offer sufficient gains in efficiency and reliability to make those costs worth your while.

I agree generally with pilcrow's approach above. I would add one thing to it: Investigate the use of the memoize() function to potentially speed up the calculate() operation in your code.
See http://perldoc.perl.org/Memoize.html for details

Related

child fork process return values

In using Parallel::ForkManager, i have few doubts. As if i am calling child process in for loop, then who will execute the next statement , parent or child. Code:
my $pm = Parallel::ForkManager->new($forks); foreach my $q (#numbers) {
my $pid = $pm->start and next;
my $res = calc($q);
if($res == error )
{return};
if (#res == some_no)
{do something and next;
}
$pm->finish(0, { result => $res, input => $q });
}....i want to know about fork return outputs and want parent process to execute 1st next and 2nd next.
Also want to know if child process end in middle, will parent be able to know it and how?
The two major sources of parallelism in perl are threading - use threads; and forking. For the latter, Parallel::ForkManager is probably the best bet out there.
However, for copying? This may not help nearly as much as you think. Your limiting factor isn't going to be CPU, it'll be IO to disk.
Parallelising IO doesn't help nearly as much as you think, and in many cases can be counter-productive - by making the disk thrash, having to write to two locations, you lower overall throughput.

Why does a program with Parallel::Loops exhaust my memory?

I've inherited some code at work i'm trying to improve on. My Perl skills are somewhat lacking so would love some assistance!
Essentially this script is SNMP polling a network of thousands of nodes to update it's local interface index cache. I've found it's hitting a problem where it's exhausting it's memory and failing. Code as follows (heavily reduced but i think you'll get the jist)
use strict;
use warnings;
use Parallel::Loops;
my %snmp_results;
my $maxProcs = 50;
my #exceptions;
my #devices;
my %snmp_results;
my $pl = Parallel::Loops->new($maxProcs);
$pl->share(\%snmp_results, \#exceptions );
load_devices();
get_snmp_interfaces();
sub get_snmp_interfaces {
$pl->foreach( \#devices, sub {
my ($name, $community, $snmp_ver) = #$_;
# Create the new ifindex cache, and return an array reference to the new entries
my $result = getSNMPIFFull($name, $community, $snmp_ver);
if (defined $result && $result ne "") {
my %cache = %{$result};
print "Got cache for $name\n";
# Build hash of all the links polled through SNMP
# [ifindex, ifdesc, ifalias, ifspeed, ip]
for my $link (keys %cache) {
$snmp_results{$name}{$cache{$link}[0]} = [$cache{$link}[0], $cache{$link}[1], $cache{$link}[2], $cache{$link}[3], $cache{$link}[4]];
}
}
else {
push(#exceptions, "Unable to poll $name - $community - $snmp_ver");
}
});
}
This particular VM has 3.1GB of ram alloctable and is idling on about 83MB usage when this script is not running. If i drop the maxProcs down to 25, it will finish fine but this script can already take a long time given the sheer number of devices + latency so would rather keep the parallelism high!
I have a feeling that the $pl->share() is sharing the ever-expanding %snmp_results with each forked process which is definitely not necessary since it's not reading/modifying other entries: just adding new entries. Is there a better way I can be doing this?
I'm also slightly unsure about my %cache = %{$result};. If this is just creating a pointer as a hash then cool but if it's doing a copy, that's also a bit wasteful!
Any help will be greatly appreciated!
Documentation of the module can be found in the CPAN here.
There's one part talking about the performance:
Also, if each loop sub returns a massive amount of data, this needs to
be communicated back to the parent process, and again that could
outweigh parallel performance gains unless the loop body does some
heavy work too.
You are probably moving around complete copies of the variables in memory, pushing to the machine's limit if the MIB to poll and number of machines are big enough.
Since what you are doing is an I/O intensive task and not a CPU task that could benefit of parallel CPU processing, I would reconsider the approach of launching so many (50!) threads for polling.
Run the program with $maxProcs down to 1 to 5 processes and see how it behaves. Do some profiling of your code, attaching Devel::NYTProf to check where you are consuming time and if increasing the number of processes actually leads to a better performance.
Reconsider using Parallel::Loops for this task. You may get better performance with use threads[1] and a hash shared between the different threads (use threads::shared).
Apologies if this could have been a comment. Starting in SO is difficult due to all the limitations that are in place :(
If you already found a solution it would be great if you could share with us your findings. I didn't know Parallel::Loops before and I think I can give it some use.

Add parallelism to perl script

I have small perl script which gets services details from mongoDB, queries its statuses and gives html output
#...some stuff to get $token
my #cmd = ('/opt/mongo/bin/mongo', '127.0.0.1:27117/service_discovery', '--quiet', '-u', 'xxx', '-p', 'xxx', '--eval', "var environ='$env'; var action='status'", '/home/mongod/www/cgi/getstatus.js');
my $mongo_out;
run \#cmd, '>>', \$mongo_out;
$json->incr_parse ($mongo_out);
while (my $obj = $json->incr_parse) {
my $hostname = "$obj->{'hostname'}";
print "<tr><td colspan=4 align=\"center\"><h4>$hostname</h4></td></tr>";
foreach my $service (#{$obj->{'services'}}) {
my $name = "$service->{'name'}";
my $port = "$service->{'port'}";
my $proto = "$service->{'proto'}";
my $request = HTTP::Request->new(GET => "${proto}://$hostname:${port}/status/service");
$request->header(Authorization => "Bearer $token");
my $ua = LWP::UserAgent->new;
$ua->timeout(2);
my $response = $ua->request($request);
my $code = $response->code();
if ($code == 200) {
my $var = %$response->{'_content'};
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
my $out = try {my $output = $coder->decode($var)} catch {undef};
if(exists $out->{'name'} && exists $out->{'version'}) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td align=\"center\">$out->{'name'}</td><td align=\"center\">$out->{'version'}</td></tr>";
} else {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">auth failed</td></tr>";
}
} elsif ($code == 500) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">offline</td></tr>";
} elsif ($code == 404) {
print "<tr><td align=\"center\">$port</td><td align=\"center\">$name</td><td colspan=2 align=\"center\">page not found</td></tr>";
}
}
}
It executes for a while, especially when some services are offline. Is it possible to query services within same host simultaneously?
This is almost a question that's too broad to answer, because ... it depends.
But yes. You have two and a half mechanism for parallelising in perl:
thread
fork
Non blocking IO.
I say two and a half, because non-blocking IO isn't really parallel, as much as solving the same problem a different way.
Implementation of parallelism is a really good way to end up with some horrific and hard to trace bugs, and requires a bit of a shift of mind set, because your code is no longer executing in a well defined sequence - the whole point is that your code might hit different bits at different times, and that can cause utter chaos.
And not least because modules you import - might well not be "thread safe" (which means they may be fine, but occasionally will break in a very unpredictable way, and you'll tear your hair out trying to track down the bug).
So with that in mind
threads
Perhaps slightly counter intuitively, if you've used threads in another language - perl threads are NOT light weight. There is a significant cost to starting them, not least because you effectively end up multiplying your memory footprint by the number of threads you are running.
I would normally suggest as a result - look at a "worker threads" model, using Thread::Queue. You start up a number of threads, and use queues to serialise the input and output from the threads.
forking
fork() is a unix native system call. You use it a lot, and it's quite efficient. It splits your program into two identical copies - including position within the code - at the point at which it's called. The only difference initially is the return code of the fork() system call - the parent will get the process ID of the child, the child will get zero.
It's quite easy to do strange thing accidentally, as both piece of code at this point are at exactly the same point in terms of loop iterations, file handles, etc. but this rapidly diverges and you can again, end up with some very strange things happening if you interact with 'shared' resources.
I would normally suggest looking at Parallel::ForkManager module as an easy way to avoid tripping yourself up with fork().
non blocking IO
You can often use something like IO::Select and the can_read method, which detects which file handles will block if you read from them - you can skip that one, until it blocks. This would also work for your use case, although it's not always applicable.
I've got examples of both the above here: Perl daemonize with child daemons

Ways to do timeouts in Perl?

I frequently use the following pattern to set an upper bound to the running time of a particular code fragment in Perl:
my $TIMEOUT_IN_SECONDS = 5;
eval {
local $SIG{ALRM} = sub { die "alarm\n" };
alarm($TIMEOUT_IN_SECONDS);
# do stuff that might timeout.
alarm(0);
};
if ($#) {
# handle timeout condition.
}
My questions:
Is this the right way to do it?
Are there any circumstances under which the running time can exceed $TIMEOUT_IN_SECONDS, or is the above method bullet-proof?
You probably want to look at Sys::SigAction. I haven't used it myself, but it has some glowing reviews.
One thing to watch out for is if "stuff that might timeout" uses sleep or alarm itself. Also, in the error handling code, I assume you're prepared for errors other than a timeout.
You could also try Time::Out. I like the syntax and nested timeouts are supported..
Take care with signal handling. Perl receives signals asynchronously and they may be lost or interfere with each other if a signal is received while another signal is being handled by the callback.
Event-handling libraries' Win32 support is pretty so-so in Perl (I have to support non-cygwin Win32), so I generally use a simple polling loop for timeouts:
use Time::HiRes qw(sleep);
sub timeout {
my $timeout = shift;
my $poll_interval = shift;
my $test_condition = shift;
until ($test_condition->() || $timeout <= 0) {
$timeout -= $poll_interval;
sleep $poll_interval;
}
return $timeout > 0; # condition was met before timeout
}
my $success = timeout(30, 0.1, \&some_condition_is_met);
The sleep timer can be easily made user- or caller-configurable and unless you are doing an extremely tight loop or have multiple callers waiting on the loop (where you can end up with a race or dead lock), it is a simple, reliable, and cross-platform way to implement a timeout.
Also note that the loop overhead will mean that you cannot guarantee that the timeout is observed absolutely. $test_condition, the decrement, garbage collection, etc. can interfere.

How can I make my Perl script use multiple cores for child processes?

I'm working on a mathematical model that uses data generated from XFOIL, a popular aerospace tool used to find the lift and drag coefficients on airfoils.
I have a Perl script that calls XFOIL repeatedly with different input parameters to generate the data I need. I need XFOIL to run 5,600 times, at around 100 seconds per run, soabout 6.5 days to complete.
I have a quad-core machine, but my experience as a programmer is limited, and I really only know how to use basic Perl.
I would like to run four instances of XFOIL at a time, all on their own core. Something like this:
while ( 1 ) {
for ( i = 1..4 ) {
if ( ! exists XFOIL_instance(i) ) {
start_new_XFOIL_instance(i, input_parameter_list);
}
}
}
So the program is checking (or preferably sleeping) until an XFOIL instance is free, when we can start a new instance with the new input parameter list.
Try Parallel::ForkManager. It's a module that provides a simple interface for forking off processes like this.
Here's some example code:
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my #input_parameter_list =
map { join '_', ('param', $_) }
( 1 .. 15 );
my $n_processes = 4;
my $pm = Parallel::ForkManager->new( $n_processes );
for my $i ( 1 .. $n_processes ) {
$pm->start and next;
my $count = 0;
foreach my $param_set (#input_parameter_list) {
$count++;
if ( ( $count % $i ) == 0 ) {
if ( !output_exists($param_set) ) {
start_new_XFOIL_instance($param_set);
}
}
}
$pm->finish;
}
$pm->wait_all_children;
sub output_exists {
my $param_set = shift;
return ( -f "$param_set.out" );
}
sub start_new_XFOIL_instance {
my $param_set = shift;
print "starting XFOIL instance with parameters $param_set!\n";
sleep( 5 );
touch( "$param_set.out" );
print "finished run with parameters $param_set!\n";
}
sub touch {
my $fn = shift;
open FILE, ">$fn" or die $!;
close FILE or die $!;
}
You'll need to supply your own implementations for the start_new_XFOIL_instance and the output_exists functions, and you'll also want to define your own sets of parameters to pass to XFOIL.
This looks like you can use gearman for this project.
www.gearman.org
Gearman is a job queue. You can split your work flow into a lot of mini parts.
I would recommend using amazon.com or even their auction able servers to complete this project.
Spending 10cents per computing hour or less, can significantly spead up your project.
I would use gearman locally, make sure you have a "perfect" run for 5-10 of your subjobs before handing it off to an amazon compute farm.
Perl threads will take advantage of multiple cores and processors. The main pro of threads is its fairly easy to share data between the threads and coordinate their activities. A forked process cannot easily return data to the parent nor coordinate amongst themselves.
The main cons of Perl threads is they are relatively expensive to create compared to a fork, they must copy the entire program and all its data; you must have them compiled into your Perl; and they can be buggy, the older the Perl, the buggier the threads. If your work is expensive, the creation time should not matter.
Here's an example of how you might do it with threads. There's many ways to do it, this one uses Thread::Queue to create a big list of work your worker threads can share. When the queue is empty, the threads exit. The main advantages are that its easier to control how many threads are active, and you don't have to create a new, expensive thread for each bit of work.
This example shoves all the work into the queue at once, but there's no reason you can't add to the queue as you go. If you were to do that, you'd use dequeue instead of dequeue_nb which will wait around for more input.
use strict;
use warnings;
use threads;
use Thread::Queue;
# Dummy work routine
sub start_XFOIL_instance {
my $arg = shift;
print "$arg\n";
sleep 1;
}
# Read in dummy data
my #xfoil_args = <DATA>;
chomp #xfoil_args;
# Create a queue to push work onto and the threads to pull work from
# Populate it with all the data up front so threads can finish when
# the queue is exhausted. Makes things simpler.
# See https://rt.cpan.org/Ticket/Display.html?id=79733
my $queue = Thread::Queue->new(#xfoil_args);
# Create a bunch of threads to do the work
my #threads;
for(1..4) {
push #threads, threads->create( sub {
# Pull work from the queue, don't wait if its empty
while( my $xfoil_args = $queue->dequeue_nb ) {
# Do the work
start_XFOIL_instance($xfoil_args);
}
# Yell when the thread is done
print "Queue empty\n";
});
}
# Wait for threads to finish
$_->join for #threads;
__DATA__
blah
foo
bar
baz
biff
whatever
up
down
left
right
Did you consider gnu parallel parallel.
It will allow you to run several install instances of your program with different inputs and
fill your CPU cores as they begin available. It's often a very simple an efficient way to achieve parallelization of simple tasks.
This is quite old but if someone is still looking for suitable answers to this question, you might want to consider Perl Many-Core-Engine (MCE)