Queue manager with nested closures, commands, and dependencies - command

I am pretty new to Laravel and am trying to create queues to handles my processing scrips.
The processing scripts will be triggers by a nightly cron job.
I have beanstalkd set up and working with queues already.
I also have very simply commands set up which will allow me to start a script using artisan.I am pretty new to Laravel and am trying to create queues to handles my processing scrips.
The processing scripts will be triggers by a nightly cron job.
I have beanstalkd set up and working with queues already.
I also have very simply commands set up which will allow me to start a script using artisan.
The main issue I am having is that I have dependencies in my scripts.
For example:
A -> B -> C
-> D
I need script A to run successfully first, then B need to run successfully, then C and D need to run but it doesn't really matter the order for C and D
Each of these are very different tasks and I want to keep them separate but need to manage the dependencies in my queue. I already have models and libraries created to handle my processing, that all works great.
Since I already have my logic in the models, I only need a small call for my queue.
Initially I wanted to just use closures and write it all in one file, a simple example is below.
Queue::push(function($jobA) use ($id)
{
ProcessA::doStuff($id);
$jobA->delete();
});
That works, but this is just for A, I still need to add B, C, and D.
So then I decided to add a closure inside of that.
Queue::push(function($jobA) use ($id)
{
$statusA = ProcessA::doStuff($id);
if($statusA)
{
Queue::push(function($jobB) use ($id)
{
ProcessB::doStuff($id);
$jobB->delete();
});
}
$jobA->delete();
});
This does not work.
There seems to be an issue with running a closure inside of a closure.
There may also be an issue with jobA possibly being deleted before jobB even starts.
That example was a little more simplified than what I actually need to do.
Here is a better example... after completion of A, I now need to get an array of things and loop through each of those and run B. Then from earlier I said I then have C and D which are spawned off of B... but I have yet to even get to the C and D part since this is getting a but complicated already.
Queue::push(function($jobA) use ($id)
{
$statusA = ProcessA::doStuff($id);
if($statusA)
{
$things = getThings($id);
foreach($things as $thing)
{
Queue::push(function($jobB) use ($thing)
{
ProcessB::doStuff($thing);
$jobB->delete();
});
}
}
$jobA->delete();
});
So nested closures still do not work.
My next step was to not nest the closures but instead write classes.
To start of the A process I would have to do
Queue::push('ProcessA', array('id' => $id));
Then
class ProcessA {
public function fire($job, $data)
{
$things = getThings($data['id']);
foreach($things as $thing)
{
Queue::push('ProcessB', array('thing' => $thing));
}
$job->delete();
}
}
Then
class ProcessB {
public function fire($job, $thing)
{
Queue::push('ProcessC', array('id' => $thing['id']));
Queue::push('ProcessD', array('id' => $thing['id']));
}
$job->delete();
}
ProcessC and ProcessD are irrelevant at this point, as long as they run.
This method seems to work but I think I was having issues sometimes when processA would completely finish before ProcessB was spawned. It was a long day figuring these things out and maybe this does work fine... need to test some more.
Also this method requires me to create a separate file for each ProcessA/B/C/D I am running due to the autoloading magic in Laravel or PHP 5.4. And as you can see each of these are pretty short. This just seems like a fairly complex way of doing things.
Does this method seemed flawed in any ways?
Should I be doing this in some other way?
Should I even bother with initializing this as artisan commands and switch to routes instead and call them with curl?
Is there any way to make nested closures work?

The simplest way I can come up with: Just fire an event when your first closure is done with its work, and then listen to that event somewhere else. When you catch it, push the next job onto the queue etc.

Related

Perl, SVN::Hooks, and SVN::Hooks::Notify

Can anyone tell me offhand if SVN::Hooks::Notify's NOTIFY and NOTIFY_DEFAULTS blocks replace (or rather, stop the evaluation of) the POST_COMMIT block? My PRE_COMMIT block works fine, and my existing NOTIFY/NOTIFY_DEFAULTS blocks process just fine.
However, nothing I have under the POST_COMMIT block fires at all... and yes, hooks/post-commit is linked to the script. The perdocs for svn::hooks::notify state that it runs within POST_COMMIT, but I would prefer to do some extra processing first before kicking off a notification email (eg, inserting pertinent information into a db table for later use).
The NOTIFY block sets a post-commit hook; there is no separate hook for notify.
And as far as I can tell from the SVN::Hook source, you can have set as many of a given hook as you want, and they will run in the order you add them in. So you may need to do e.g.:
use SVN::Hooks;
BEGIN {
POST_COMMIT { ... }
}
use SVN::Hooks::Notify;
to have your other hook come before the notify hook.

Unexpected behavior in nested recursive function

I have some code that behaves rather strangely.
I am inside a function, and I declare a nested one, which should check if something isn't okay. If it's not then it should sleep for five seconds and call itself again.
sub stop {
sub wait_for_stop {
my $vm_view = shift;
if ( $vm_view->runtime->powerState->val ne "poweredOff" ) {
debug("...");
sleep(5);
wait_for_stop();
}
}
debug("Waiting for the VM to stop");
wait_for_stop( #$vm_views[0] );
}
So, in the call that results in the recursion inside the if condition, if I put the parameter (as the function definition expects it), like this:
wait_for_stop($vm_view);
I get an infinite loop.
If I leave it without a parameter, as in the code example above, it works as expected.
Shouldn't $vm_view in the subsequent calls be empty? Or the last used value ($vm_view->runtime->powerState->val)? Either case should result in unexpected behavior and error.
But it works without any parameter. So why is that? Is there something I've missed from perldoc?
EDIT1: Actually, the value of $vm_views does get changed, so that's not the reason for the infinite loop.
General clarifications
I am using the VMware SDK. The $vm_views object contains the VM details. I am polling one of its methods to detect a change, in this particular case, I need to know when the machine is turned off. So, for lack of a better way, I make a call every 5 seconds until the value is satisfactory.
My purpose is to stop the VM, make modifications that can only be made while it's off, and then launch it.
Actual question
When I don't pass a parameter, the block works as expected – it waits until the value is poweredOff (the VM is off), and continues, which doesn't make much sense, at least to me.
In the case I put $vm_view as parameter, I get an infinite loop (the value will still get changed, since I'm calling a method).
So I am wondering why the function works, when after the first call, $vm_view should be undef, and therefore, be stuck in an infinite loop? [undef ne "poweredOff" -> sleep -> recursion till death]
And why, when I pass the expected value, it gets stuck?
PS: To all those saying my recursion is weird and useless in this scenario – due to a myriad of reasons, I need to use such a format (it's better suited for my needs, since, after I get this bit working, I'll modify it to add various stuff and reuse it, and, for what I have in mind, a function seems to be the best option).
You should always look at your standard tools before going for something a little more exotic like recursion. All you need here is a while loop
It's also worth noting that #$vm_views[0] should be $$vm_views[0]) or, better, $vm_views->[0]. And you don't gain anything by defining a subroutine inside another one -- the effect is the same as if it was declared separately afterwards
An infinite loop is what I would expect if $vm_view->runtime->powerState->val never returns poweredOff, and the code below won't fix that. I don't see any code that tells the VM to stop before you wait for the status change. Is that correct?
I don't understand why you say that your code works correctly when you call wait_for_stop without any parameters. You will get the fatal error
Can't call method "runtime" on an undefined value
and your program will stop. Is what you have posted the real code?
This will do what you intended. I also think it's easier to read
use strict;
use warnings;
my $vm_views;
sub stop {
debug ("Waiting for the VM to stop");
my $vm_view = $vm_views->[0];
while ( $vm_view->runtime->powerState->val ne 'poweredOff' ) {
debug('...');
sleep 5;
}
}
I think you would have a better time not calling wait_for_stop() recursively. This way might serve you better:
sub stop
{
sub wait_for_stop
{
my $vm_view = shift;
if ($vm_view->runtime->powerState->val ne "poweredOff")
{
debug("...");
#sleep(5);
#wait_for_stop();
return 0;
}
return 1;
}
debug ("Waiting for the VM to stop");
until(wait_for_stop(#$vm_views[0]))
{
sleep(5);
}
}
Your old way was rather confusing and I don't think you were passing the $vm_view variable through to the recursive subroutine call.
Update:
I tried reading it here:
https://www.vmware.com/support/developer/viperltoolkit/doc/perl_toolkit_guide.html#step3
It says:
When the script is run, the VI Perl Toolkit runtime checks the
environment variables, configuration file contents, and command-line
entries (in this order) for the necessary connection-setup details. If
none of these overrides are available, the runtime uses the defaults
(see Table 1 ) to setup the connection.
So, the "runtime" is using the default connection details even when a vm object is not defined? May be?
That still doesn't answer why it doesn't work when the parameter is passed.
You need to understand the VM SDK better. You logic for recursion and usage of function parameters are fine.
Also, the page: https://www.vmware.com/support/developer/viperltoolkit/doc/perl_toolkit_guide.html
says -
VI Perl Toolkit views have several characteristics that you should
keep in mind as you write your own scripts. Specifically, a view:
Is a Perl object
Includes properties and methods that correlate to the properties and
operations of the server-side managed object
Is a static copy of one or more server-side managed objects, and as
such (static), is not updated automatically as the state of objects
on the server changes.
So what the "vm" function returns is a static copy, which can be updated from the script. May be it is getting updated when you make a call while passing the $vm_view?
Old Answer:
Problem is not what you missed from Perl docs. The problem is with your understanding of recursion.
The purpose of recursion is to keep running until $vm_view->runtime->powerState->val becomes "PoweredOff" and then cascade back. If you don't update the value, it keeps running forever.
When you say:
I get an infinite loop.
Are you updating the $vm_view within the if condition?
Otherwise, the variable is same every time you call the function and hence you can end up in infinite loop.
If I leave it without a parameter, as in the code example above, it
works as expected.
How can it work as expected? What is expected? There is no way the function can know what value your $vm_view is being updated with.
I have simplified the code, added updating a simple variable (similar to your $vm_view) and tested. Use this to understand what is happening:
sub wait_for_stop
{
my $vm_view = shift;
if ($vm_view < 10){
print "debug...\n";
sleep(5);
$vm_view++; // update $vm_view->runtime->powerState->val here
// so that it becomes "poweredOff" at some point of time
// and breaks the recursion
wait_for_stop($vm_view);
}
}
wait_for_stop(1);
Let me know in comments how the variable is being updated and I will help resolve.

perl: function name clash

Let's say, there are two modules is our framework: Handler.pm and Queries.pm
Queries.pm is optional and is being loaded at fastcgi process startup
BEGIN {
&{"check_module_$_"} () foreach (Queries);
}
sub check_module_queries {
...
require Eludia::Content::Queries;
...
}
every module function is loaded in one common namespace
now, there are two functions with same name (setup_page_content) in Handler.pm and Queries.pm
setup_page_content is being called in Handler.pm
It looks like original author suggested that Queries.pm:setup_page_content will be called, whenever Queries.pm is loaded
Sometimes it doesn't happen: traceback (obtained via caller ()) in these cases indicates, that setup_page_content was called from module Handler.pm
I logged %INC just before call and it contains Queries.pm and it full path in these cases
This behaviour is inconsistent and pops like in 2-3% of attempts on production installation, mostly when I send two parallel identical http requests. Due amount of effort to reproduce, I doesn't determine yet, whether it is installation specific.
How it will be decided which version of function with same name will be called?
Is it well-defined behaviour?
There should be a reason, original author wrote the code this way
UPDATE
perl version is v5.10.1 built for x86_64-linux-gnu-thread-multi
UPDATE 2
code: Handler.pm and Queries.pm
Queries.pm loading occurs in check_module_queries (BEGIN {} of Eludia.pm),
loaded for every request using Loader.pm (via use Loader.pm <params>)
I'd like to call Custom.pm:setup_page_content, whenever Custom.pm is loaded
So you'd like to call Custom::setup_page_content when it exists.
if (exists(&Custom::setup_page_content)) {
Custom::setup_page_content(...);
} else {
Handler::setup_page_content(...);
}
Sometimes it doesn't happen.
The total lack of information about this prevents me from commenting. Code? Anything?
Is there a way to 'unload' a module but leave it in %INC?
Loading a module is simply running it. You can't "unrun" code. At best, you can manually undo some of the effects of running it. Whether that includes changing %INC or not is entirely up to you.
When there is a namespace collision, the most recently defined/loaded function will take precedence.
These are not proper modules. These are simply libraries. If you turn them into proper modules with proper name spacing you won't have this issue. You barely need to do anything more than:
package Queries;
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};
bless ( $self, $class );
return $self;
}
And you'll be well on your way. Of course your program will need a couple changes to actually access the class functions. You can look at exporting to the functions to possibly save some refactoring time.
I have forked the code at github and initiated a pull request that I hope will address your problem.
The behaviour is perfectly defined: when you call PACKAGENAME::function_name, PACKAGENAME::function_name is called. It cannot be more than one PACKAGENAME::function_name at at time.
It is possible to redefine each function so many times as you wish. When you, say, execute some code like
eval "sub PACKAGENAME::function_name {...}";
the function is [re]defined. When you require some .pm file, it's much like eval the string with its content.
If PACKAGENAME is not defined explicitly, the __PACKAGE__ value is used.
So if you remark that the version of setup_page_content is taken from Handler.pm, it must indicate that Handler.pm's content was executed (by use, require, read/eval etc.) after Queries.pm was loaded (for the last time).
So you have to track all events of Handler.pm loading. The easy way is to add some debug print at the start of Handler.pm.

Perl, how to fetch data from urls in parallel?

I need to fetch some data from many web data providers, who do not expose any service, so I have to write something like this, using for example WWW::Mechanize:
use WWW::Mechanize;
#urls = ('http://www.first.data.provider.com', 'http://www.second.data.provider.com', 'http://www.third.data.provider.com');
%results = {};
foreach my $url (#urls) {
$mech = WWW::Mechanize->new();
$mech->get($url);
$mech->form_number(1);
$mech->set_fields('user' => 'myuser', pass => 'mypass');
$resp = $mech->submit();
$results{$url} = parse($resp->content());
}
consume(%results);
Is there some (possibly simple ;-) way to fetch data to a common %results variable, simultaneously, i.e: in parallel, from all the providers?
threads are to be avoided in Perl. use threads is mostly for
emulating UNIX-style fork on Windows; beyond that, it's pointless.
(If you care, the implementation makes this fact very clear. In perl,
the interpreter is a PerlInterpreter object. The way threads
works is by making a bunch of threads, and then creating a brand-new
PerlInterpreter object in each thread. Threads share absolutely
nothing, even less than child processes do; fork gets you
copy-on-write, but with threads, all the copying is done in Perl
space! Slow!)
If you'd like to do many things concurrently in the same process, the
way to do that in Perl is with an event loop, like
EV,
Event, or
POE, or by using Coro. (You can
also write your code in terms of the AnyEvent API, which will let
you use any event loop. This is what I prefer.) The difference
between the two is how you write your code.
AnyEvent (and EV, Event,
POE, and so on) forces you to write your code in a callback-oriented
style. Instead of control flowing from top to bottom, control is in a
continuation-passing style. Functions don't return values, they call
other functions with their results. This allows you to run many IO
operations in parallel -- when a given IO operation has yielded
results, your function to handle those results will be called. When
another IO operation is complete, that function will be called. And
so on.
The disadvantage of this approach is that you have to rewrite your
code. So there's a module called Coro that gives Perl real
(user-space) threads that will let you write your code top-to-bottom,
but still be non-blocking. (The disadvantage of this is that it
heavily modifies Perl's internals. But it seems to work pretty well.)
So, since we don't want to rewrite
WWW::Mechanize
tonight, we're going to use Coro. Coro comes with a module called
Coro::LWP that will make
all calls to LWP be
non-blocking. It will block the current thread ("coroutine", in Coro
lingo), but it won't block any other threads. That means you can make
a ton of requests all at once, and process the results as they become
available. And Coro will scale better than your network connection;
each coroutine uses just a few k of memory, so it's easy to have tens
of thousands of them around.
With that in mind, let's see some code. Here's a program that starts
three HTTP requests in parallel, and prints the length of each
response. It's similar to what you're doing, minus the actual
processing; but you can just put your code in where we calculate the
length and it will work the same.
We'll start off with the usual Perl script boilerplate:
#!/usr/bin/env perl
use strict;
use warnings;
Then we'll load the Coro-specific modules:
use Coro;
use Coro::LWP;
use EV;
Coro uses an event loop behind the scenes; it will pick one for you if
you want, but we'll just specify EV explicitly. It's the best event
loop.
Then we'll load the modules we need for our work, which is just:
use WWW::Mechanize;
Now we're ready to write our program. First, we need a list of URLs:
my #urls = (
'http://www.google.com/',
'http://www.jrock.us/',
'http://stackoverflow.com/',
);
Then we need a function to spawn a thread and do our work. To make a
new thread on Coro, you call async like async { body; of the
thread; goes here }. This will create a thread, start it, and
continue with the rest of the program.
sub start_thread($) {
my $url = shift;
return async {
say "Starting $url";
my $mech = WWW::Mechanize->new;
$mech->get($url);
printf "Done with $url, %d bytes\n", length $mech->content;
};
}
So here's the meat of our program. We just put our normal LWP program
inside async, and it will be magically non-blocking. get blocks,
but the other coroutines will run while waiting for it to get the data
from the network.
Now we just need to start the threads:
start_thread $_ for #urls;
And finally, we want to start handling events:
EV::loop;
And that's it. When you run this, you'll see some output like:
Starting http://www.google.com/
Starting http://www.jrock.us/
Starting http://stackoverflow.com/
Done with http://www.jrock.us/, 5456 bytes
Done with http://www.google.com/, 9802 bytes
Done with http://stackoverflow.com/, 194555 bytes
As you can see, the requests are made in parallel, and you didn't have
to resort to threads!
Update
You mentioned in your original post that you want to limit the number of HTTP requests that run in parallel. One way to do that is with a semaphore,
Coro::Semaphore in Coro.
A semaphore is like a counter. When you want to use the resource that a semaphore protects, you "down" the semaphore. This decrements the counter and continues running your program. But if the counter is at zero when you try to down the semaphore, your thread/coroutine will go to sleep until it is non-zero. When the count goes up again, your thread will wake up, down the semaphore, and continue. Finally, when you're done using the resource that the semaphore protects, you "up" the semaphore and give other threads the chance to run.
This lets you control access to a shared resource, like "making HTTP requests".
All you need to do is create a semaphore that your HTTP request threads will share:
my $sem = Coro::Semaphore->new(5);
The 5 means "let us call 'down' 5 times before we block", or, in other words, "let there be 5 concurrent HTTP requests".
Before we add any code, let's talk about what can go wrong. Something bad that could happen is a thread "down"-ing the semaphore, but never "up"-ing it when it's done. Then nothing can ever use that resource, and your program will probably end up doing nothing. There are lots of ways this could happen. If you wrote some code like $sem->down; do something; $sem->up, you might feel safe, but what if "do something" throws an exception? Then the semaphore will be left down, and that's bad.
Fortunately, Perl makes it easy to have scope Guard objects, that will automatically run code when the variable holding the object goes out of scope. We can make the code be $sem->up, and then we'll never have to worry about holding a resource when we don't intend to.
Coro::Semaphore integrates the concept of guards, meaning you can say my $guard = $sem->guard, and that will automatically down the semaphore and up it when control flows away from the scope where you called guard.
With that in mind, all we have to do to limit the number of parallel requests is guard the semaphore at the top of our HTTP-using coroutines:
async {
say "Waiting for semaphore";
my $guard = $sem->guard;
say "Starting";
...;
return result;
}
Addressing the comments:
If you don't want your program to live forever, there are a few options. One is to run the event loop in another thread, and then join on each worker thread. This lets you pass results from the thread to the main program, too:
async { EV::loop };
# start all threads
my #running = map { start_thread $_ } #urls;
# wait for each one to return
my #results = map { $_->join } #running;
for my $result (#results) {
say $result->[0], ': ', $result->[1];
}
Your threads can return results like:
sub start_thread($) {
return async {
...;
return [$url, length $mech->content];
}
}
This is one way to collect all your results in a data structure. If you don't want to return things, remember that all the coroutines share state. So you can put:
my %results;
at the top of your program, and have each coroutine update the results:
async {
...;
$results{$url} = 'whatever';
};
When all the coroutines are done running, your hash will be filled with the results. You'll have to join each coroutine to know when the answer is ready, though.
Finally, if you are doing this as part of a web service, you should use a coroutine-aware web server like Corona. This will run each HTTP request in a coroutine, allowing you to handle multiple HTTP requests in parallel, in addition to being able to send HTTP requests in parallel. This will make very good use of memory, CPU, and network resources, and will be pretty easy to maintain!
(You can basically cut-n-paste our program from above into the coroutine that handles the HTTP request; it's fine to create new coroutines and join inside a coroutine.)
Looks like ParallelUserAgent is what you're looking for.
Well, you could create threads to do it--specifically see perldoc perlthrtut and Thread::Queue. So, it might look something like this.
use WWW::Mechanize;
use threads;
use threads::shared;
use Thread::Queue;
my #urls=(#whatever
);
my %results :shared;
my $queue=Thread::Queue->new();
foreach(#urls)
{
$queue->enqueue($_);
}
my #threads=();
my $num_threads=16; #Or whatever...a pre-specified number of threads.
foreach(1..$num_threads)
{
push #threads,threads->create(\&mechanize);
}
foreach(#threads)
{
$queue->enqueue(undef);
}
foreach(#threads)
{
$_->join();
}
consume(\%results);
sub mechanize
{
while(my $url=$queue->dequeue)
{
my $mech=WWW::Mechanize->new();
$mech->get($url);
$mech->form_number(1);
$mech->set_fields('user' => 'myuser', pass => 'mypass');
$resp = $mech->submit();
$results{$url} = parse($resp->content());
}
}
Note that since you're storing your results in a hash (instead of writing stuff to a file), you shouldn't need any kind of locking unless there's a danger of overwriting values. In which case, you'll want to lock %results by replacing
$results{$url} = parse($resp->content());
with
{
lock(%results);
$results{$url} = parse($resp->content());
}
Try https://metacpan.org/module/Parallel::Iterator -- saw a very good presentation about it last week, and one of the examples was parallel retrieval of URLs -- it's also covered in the pod example. It's simpler than using threads manually (although it uses fork underneath).
As far as I can tell, you'd still be able to use WWW::Mechanize, but avoid messing about with memory sharing between threads. It's a higher-level model for this task, and might be a little simpler, leaving the main logic of #Jack Maney's mechanize routine intact.

What Event Handler to Subscribe and Assert.AreEqual Fails

I want to invoke a method when my integration test fails (i.e., Assert.AreEqual fails), is there any event delegate that I can subscribe to in NUnit framework? Of course the event delegate must be fired when the tests fail.
This is because in my tests there are a lot of Assert statement, and I can't tell need to log the Assert, along with the assertion information that cause a problem in my bug tracker. The best way to do this is to when the test fails, a delegate method is invoke.
I'm curious, what problem are you trying to solve by doing this? I don't know of a way to do this with NUnit, but there is a messy way to do it. You can supply a Func and invoke it as the fail message. The body of the Func can provide you delegation you're looking for.
var callback = new Func<string>(() => {
// do something
return "Reason this test failed";
});
Assert.AreEqual("", " ", callback.Invoke());
It seems that there is no event I can subscribe to in case of assert fail
This sounds like you are wanting to create your own test-runner.
There is a distinction between tests (a defines the actions) and the running of the tests (actually running the tests). In this case, if you want to detect that a test failed, you would have to do this when the test is run.
If all you are looking to do is send an email on fail, it may be easiest to create a script (powershell/batch) that runs the command line runner, and sends an email on fail (to your bug tracking system?)
If you want more complex interactivity, you may need to consider creating a custom runner. The runner can run the test and then take action based on results. In this case you should look in the Nunit.Core namespace for the test runner interface.
For example (untested):
TestPackage package = new TestPackage();
package.add("c:\some\path\to.dll");
SimpleTestRunner runner = new SimpleTestRunner();
if (runner.Load(Package)){
var results = runner.Run(new NullListener(), TestFilter.Empty, false, LoggingThreshold.Off);
if(results.ResultState != ResultState.Success){
... do something interesting ...
}
}
EDIT: better snippet of code https://stackoverflow.com/a/5241900/1961413