Race between pool requests in Guzzle - guzzle

I'm doing mutiple api concurrent requests using guzzle Pool.
Everything's working fine.
But I want to stop/avoid all requests if any of the requests responded. That is, I want to do some race between the requests. Is it possible using Guzzle in laravel?
Here's what I've done so far:
$requests = function(array $urls){
foreach ($urls as $url) {
yield new Request('GET', $url);
}
};
$pool = new Pool($client,
$requests($urls),
[
'concurrency' => 5,
'fulfilled' => function($response, $index) use ($urls){
echo "<br>Completed ".$urls[$index];
},
'rejected' => function($reason, $index){
echo "Rejected ".$index;
},
]);
$promise = $pool->promise();
$promise->wait();
$urls is an array of URIs

I don't think it is possible with a current implementation of the Guzzle Pool. The only thing you can possibly do with it is to exit; in the fulfilled function:
'fulfilled' => function($response, $index) use ($urls){
echo "Completed " . $urls[$index];
exit;
},
In this case it will still send all the requests, but immediately exit the script on the fastest response.
Without the Pool you can use GuzzleHttp\Promise\any or GuzzleHttp\Promise\some helper functions:
use GuzzleHttp\Client;
use GuzzleHttp\Promise;
$client = new Client(['base_uri' => 'http://site.local/']);
// Initiate each request but do not block
$promises = [
'delay3' => $client->getAsync('/async/delay3.php'),
'delay2' => $client->getAsync('/async/delay2.php'),
'delay1' => $client->getAsync('/async/delay1.php'),
];
//Initiate a competitive race between multiple promises
$promise = Promise\any($promises)->then(
function (\GuzzleHttp\Psr7\Response $response) {
echo "Completed: " . $response->getStatusCode() . "\n";
echo $response->getBody() ."\n";
},
function ($reason) {
echo $reason;
}
);
$results = $promise->wait();
From the docs for GuzzleHttp\Promise\some($count, $promises):
Initiate a competitive race between multiple promises or values
(values will become immediately fulfilled promises).
When count amount of promises have been fulfilled, the returned
promise is fulfilled with an array that contains the fulfillment
values of the winners in order of resolution.
This promise is rejected with a {#see
GuzzleHttp\Promise\AggregateException} if the number of fulfilled
promises is less than the desired $count.
From the docs for GuzzleHttp\Promise\any($promises):
Like some(), with 1 as count. However, if the promise fulfills, the
fulfillment value is not an array of 1 but the value directly.

Related

Setting the "finish" event on a Mojolicious/Minion::Job

I am trying to have Minion jobs feed back to the Mojolicious web app upon completion (probably by posting a message to an API). The underlying idea here is then for the web app to feed back to the client that has uploaded/started the job.
I have tried doing this:
get '/' => sub {
my $c = shift;
my $id = $c->minion->enqueue('thing', [ qw/a b 1/, { foo => 'bar' } ]);
my $job = $c->minion->job($id);
$job->on(finish => sub ($job) {
my $id = $job->id;
my $task = $job->task;
$job->app->log->info(qq{Job "$id" was performed with task "$task"});
});
$c->render(template => 'index');
};
which doesn't work - I guess because the event is only emitted in the process performing the job, and the event does not get serialized and queued.
If I do this:
app->minion->add_task
(thing => sub ($job, $c, $sub, #args) {
$job->on(finish => sub ($job) {
my $id = $job->id;
my $task = $job->task;
$job->app->log->info(qq{Job "$id" was performed with task "$task"});
});
sleep 2;
});
it works ok, but it means I have to add the event handling to every task - which adds complexity to the code.
Is there a way to avoid having to do this?
I am thinking of:
being able to set a default class for jobs (so that jobs are all of a subclass of Mojo::Job - for example Minion::Job::WithFeedback)
better yet, being able to inject roles into task creation (so that you can do $c->minion->enqueue('thing', [ qw/a b 1/, { foo => 'bar' } ], { roles => qw/+WithFeedback +WithTimeout/);
I know I could poll all jobs regularly and see what changed status - this is what the Minion::Admin plugin does - but I would like to see if there is a different way that doesn't require polling the database.
Is this possible? and while we're at it - is this a bad idea in and of itself?

Delayed response to slash command with Mojolicious in Perl

I am trying to create a slack application in Perl with mojolicious and I am having the following use case:
Slack sends a request to my API from a slash command and needs a response in a 3 seconds timeframe. However, Slack also gives me the opportunity to send up to 5 more responses in a 30 minute timeframe but still needs an initial response in 3 seconds (it just sends a "late_response_url" in the initial call back so that I could POST something to that url later on). In my case I would like to send an initial response to slack to inform the user that the operation is "running" and after a while send the actual outcome of my slow function to Slack.
Currently, I can do this by spawning a second process using fork() and using one process to respond imidiately as Slack dictates and the second to do the rest of the work and respond later on.
I am trying to do this with Mojolicious' subprocesses to avoid using fork(). However I can't find a way to get this to work....
a sample code of what I am already doing with fork is like this:
sub withpath
{
my $c = shift;
my $user = $c->param('user_name');
my $response_body = {
response_type => "ephemeral",
text => "Running for $user:",
attachments => [
{ text => 'analyze' },
],
};
my $pid = fork();
if($pid != 0){
$c->render( json => $response_body );
}else{
$output = do_time_consuming_things()
$response_body = {
response_type => "in-channel",
text => "Result for $user:",
attachments => [
{ text => $output },
],
};
my $ua = Mojo::UserAgent->new;
my $tx = $ua->post(
$response_url,
{ Accept => '*/*' },
json => $response_body,
);
if( my $res = $tx->success )
{
print "\n success \n";
}
else
{
my $err = $tx->error;
print "$err->{code} response: $err->{message}\n" if $err->{code};
print "Connection error: $err->{message}\n";
}
}
}
So the problem is that no matter how I tried I couldn't replicate the exact same code with Mojolicious' subproccesses. Any ideas?
Thanks in advance!
Actually I just found a solution to my problem!
So here is my solution:
my $c = shift; #receive request
my $user = $c->param('user_name'); #get parameters
my $response_url = $c->param('response_url');
my $text = $c->param('text');
my $response_body = { #create the imidiate response that Slack is waiting for
response_type => "ephemeral",
text => "Running for $user:",
attachments => [
{ text => 'analyze' },
],
};
my $subprocess = Mojo::IOLoop::Subprocess->new; #create the subprocesses
$subprocess->run(
sub {do_time_consuming_things($user,$response_url,$text)}, #this callback is the
#actuall subprocess that will run in background and contains the POST request
#from my "fork" code (with the output) that should send a late response to Slack
sub {# this is a dummy subprocess doing nothing as this is needed by Mojo.
my ($subprocess, $err, #results) = #_;
say $err if $err;
say "\n\nok\n\n";
}
);
#and here is the actual imidiate response outside of the subprocesses in order
#to avoid making the server wait for the subprocess to finish before responding!
$c->render( json => $response_body );
So I actually simply had to put my code of do_time_consuming_things in the first callback (in order for it to run as a subprocess) and use the second callback (that is actually linked to the parent process) as a dummy one and keep my "imidiate" response in the main body of the whole function instead of putting it inside one of the subprocesses. See code comments for more information!

Why would hot deploy of Hypnotoad rerun old http requests?

The nutshell:
When I do a hot deployment of Hypnotoad sometimes the new server immediately processes a slew of HTTP requests that were already handled by the previous server.
If a response has been rendered but the thread is still doing some processing does Mojo/Hypnotoad retain the request until the processing has stopped? Do I need to tell the server that the HTTP request is resolved?
The long version:
I have a Mojolicious::Lite app running under Hypnotoad.
The app's function is to accept HTTP requests from another service.
We are processing jobs that progress through a series of states.
At each job state change the app is notified with an HTTP request.
This is a busy little script - recieving more than 1000 req/hour.
The scripts job is to manipulate some data .. doing DB updates, editng files, sending mail.
In an effort to keep things moving along, when it recieves the HTTP request it sanity checks the data it recieved. If the data looks good it sends a 200 response to the caller immediately and then continues on to do the more time consuming tasks. (I'm guessing this is the underlying cause)
When I hot deploy - by rerunning the start script (which runs 'localperl/bin/hypnotoad $RELDIR/etc/bki/bki.pl') - some requests that were already handled are sent to the new server and reprocessed.
Why are these old transactions still being held by the original server? Many have been long since completed!
Does the need to tell Mojolicious that the request is done before it goes off and messes with data?
(I considered $c->finish() but that is just for sockets?)
How does Hypnotoad decide what requests should be passed to it's replacement server?
Here is some psuedo code with what I'm doing:
get '/jobStateChange/:jobId/:jobState/:jobCause' => sub {
my $c =shift;
my $jobId = $c->stash("jobId");
return $c->render(text => "invalid jobId: $jobId", status => 400) unless $jobId=~/^\d+$/;
my $jobState = $c->stash("jobState");
return $c->render(text => "invalid jobState: $jobState", status => 400) unless $jobState=~/^\d+$/;
my $jobCause = $c->stash("jobCause");
return $c->render(text => "invalid jobCause: $jobCause", status => 400) unless $jobCause=~/^\d+$/;
my $jobLocation = $c->req->param('jobLocation');
if ($jobLocation){ $jobLocation = $ENV{'DATADIR'} . "/jobs/" . $jobLocation; }
unless ( $jobLocation && -d $jobLocation ){
app->log->debug("determining jobLocation because passed job jobLocation isn't useable");
$jobLocation = getJobLocation($jobId);
$c->stash("jobLocation", $jobLocation);
}
# TODO - more validation? would BKI lie to us?
return if $c->tx->res->code && 400 == $c->tx->res->code; # return if we rendered an error above
# tell BKI we're all set ASAP
$c->render(text => 'ok');
handleJobStatusUpdate($c, $jobId, $jobState, $jobCause, $jobLocation);
};
sub handleJobStatusUpdate{
my ($c, $jobId, $jobState, $jobCause, $jobLocation) = #_;
app->log->info("job $jobId, state $jobState, cause $jobCause, loc $jobLocation");
# set the job states in jobs
app->work_db->do($sql, undef, #params);
if ($jobState == $SOME_JOB_STATE) {
... do stuff ...
... uses $c->stash to hold data used by other functions
}
if ($jobState == $OTHER_JOB_STATE) {
... do stuff ...
... uses $c->stash to hold data used by other functions
}
}
Your request will not be complete until the request handler returns. This little app, for example, will take 5 seconds to output "test":
# test.pl
use Mojolicious::Lite;
get '/test' => sub { $_[0]->render( text => "test" ); sleep 5 };
app->start;
The workaround for your app would be to run handleJobStatusUpdate in a background process.
get '/jobStateChange/:jobId/:jobState/:jobCause' => sub {
my $c =shift;
my $jobId = $c->stash("jobId");
my $jobState = $c->stash("jobState");
my $jobCause = $c->stash("jobCause");
my $jobLocation = $c->req->param('jobLocation');
...
$c->render(text => 'ok');
if (fork() == 0) {
handleJobStatusUpdate($c, $jobId, $jobState, $jobCause, $jobLocation);
exit;
}

How to match a result to a request when sending multiple requests?

A. Summary
As its title, Guzzle allows to send multiple requests at once to save time, as in documentation.
$responses = $client->send(array(
$requestObj1,
$requestObj2,
...
));
(given that each request object is an instance of
Guzzle\Http\Message\EntityEnclosingRequestInterface)
When responses come back, to identify which response is for which request, we can loop through each request and get the response (only available after executing the above command):
$response1 = $requestObj1->getResponse();
$response2 = $requestObj2->getResponse();
...
B. Problem
If the request object contains the same data. It's impossible to identify the original request.
Assume we have the following scenario where we need to create 2 articles: A and B on a distance server: something.com/articles/create.json
Each request has same POST data:
subject: This is a test article
After created, the Guzzle responses with 2 location come back:
something.com/articles/223.json
something.com/articles/245.json
Using the above method to link response-to-its-request, we still don't know which response is for which article, because the request object is exactly the same.
Hence in my database I cannot write down the result:
article A -> Location: 245.json
article B -> Location: 223.json
because it can be the other way arround:
article A -> Location: 223.json
article B -> Location: 245.json
A solution is to put some extra parameter in the POST request, e.g.
subject: This is a test article
record: A
However, the distance server will return error and does not create article because it does not understand the key "record". The distance server is a third party server and I cannot change the way it works.
Another proper solution for this is to set some specific id/tag on the request object, so we can identify it afterwards. However, I've looked through the documentation but there is no method to uniquely identity the request like
$request->setID("id1")
or
$request->setTag("id1")
This has been bugging me for months and still cannot resolve this issue.
If you have solution, please let me know.
Many many thanks and you've saved me!!!!
Thanks for reading this long post.
I've found a proper way to do it, Guzzle allow to add callback once a request is completed. So we can achieve this by setting it on each request in the batch
Each request by default can be created like this
$request = $client->createRequest('GET', 'http://httpbin.org', [
'headers' => ['X-Foo' => 'Bar']
]);
So, to achieve what we want:
$allRequests = [];
$allResults = [];
for($k=0; $k<=10; $k++){
$allRequests['key_'.$k] = $client->createRequest('GET', 'http://httpbin.org?id='.$k, [
'headers' => ['X-Foo' => 'Bar'],
'events' => [
'complete' => function ($e) use (&$allResults, $k){
$response = $e->getResponse();
$allResults['key_'.$k] = $response->getBody().'';
}
]
]);
}
$client->sendAll(array_values($allRequests));
print_r($allResults);
So now the $allResults has result for each corresponding request.
e.g. $allResults['key_1'] is the result of $allRequests['key_1']
I was having the same problem with this.
I solved it by adding a custom query parameter with a unique id generated for each request and add it to the request url (you will need to remember this ids for each one of them to address it after).
After $responses = $client->send($requests) you could iterate through the responses and retrieve the effective url $response->getEffectiveUrl() and parse it (see parse_url and parse_str) to get the custom parameter (with the unique id) and search in your array of requests which one has it.
I found a much better answer.
I was sending batches of 20 requests at a time, 4 concurrently, and used the pooling technique where I got fulfilled, and rejected back, as in the documentation.
I found that I could add this code to the end of my requestAsync() function calls, when yielding / building the array (I do both in different places).
$request = $request->then(function (\GuzzleHttp\Psr7\Response $response) use ($source_db_object) {
$response->_source_object = $source_db_object;
return $response;
});
And then in the clousures on the pool, I can just access the _source_object on the response normally, and it works great.
I find it a little hacky, but if you are just sure to use a name that NEVER clashes with anything in Guzzle, this should be fine.
Here is a full example:
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Response as GuzzleResponse;
$client = new Client();
$requests = [];
// Simple set-up here, generate some random async requests
for ($i = 0; $i < 10; $i++) {
$request = $client->requestAsync('GET', 'https://jsonplaceholder.typicode.com/todos/1');
// Here we can attach any identifiable data
$request->_source_object = $i;
array_push($requests, $request);
}
$generator = function () use($requests) {
while ($request = array_pop($requests)) {
yield function() use ($request) {
return $request->then(function (GuzzleResponse $response) use ($request) {
// Attach _source_object from request to the response
$response->_source_object = $request->_source_object ?? [];
return $response;
});
};
}
};
$requestPool = new Pool($client, $generator(), [
'concurrency' => 5,
'fulfilled' => function ($response) {
// Then we can properly access the _source_object data once response has arrived here!
echo $response->_source_object . "\n";
}
]);
$requestPool->promise()->wait();
I do it this way :
// create your requests
$requests[] = $client->createRequest('GET', '/endpoint', ['config' => ['order_id' => 123]]);
...
// in your success callback get
$id = $event->getRequest()->getConfig()['order_id']
An update related to the new GuzzleHttp guzzlehttp/guzzle
Concurrent/parallel calls are now run through a few different methods including Promises.. Concurrent Requests
The old way of passing a array of RequestInterfaces will not work anymore.
See example here
$newClient = new \GuzzleHttp\Client(['base_uri' => $base]);
foreach($documents->documents as $doc){
$params = [
'language' =>'eng',
'text' => $doc->summary,
'apikey' => $key
];
$requestArr[$doc->reference] = $newClient->getAsync( '/1/api/sync/analyze/v1?' . http_build_query( $params) );
}
$time_start = microtime(true);
$responses = \GuzzleHttp\Promise\unwrap($requestArr); //$newClient->send( $requestArr );
$time_end = microtime(true);
$this->get('logger')->error(' NewsPerf Dev: took ' . ($time_end - $time_start) );
In this example you will be able to refer to each of the Responses using $requestArr[$doc->reference] . In short give an index to your array and run the Promise::unwrap call.
I also had come across this issue. This was the first thread coming up. I know this is a resolved thread, but I have eventually come up with a better solution. This is for all those who might encounter the issue.
One of the options is to use Guzzle Pool::batch.
What batch does is, it pushed the results of pooled requests into an array and returns the array. This ensures that the response and requests are in the same order.
$client = new Client();
// Create the requests
$requests = function ($total) use($client) {
for ($i = 1; $i <= $total; $i++) {
yield new Request('GET', 'http://www.example.com/foo' . $i);
}
};
// Use the Pool::batch()
$pool_batch = Pool::batch($client, $requests(5));
foreach ($pool_batch as $pool => $res) {
if ($res instanceof RequestException) {
// Do sth
continue;
}
// Do sth
}

Linking responses to requests with Facebook Batch Requests

I'm using Facebook's Batch Requests to post to multiple feeds and I need to link the correct response to every request in the batch. Since I found no definitive info on the documentation, do the members of the returned array appear in the same order as the requests?
In other words, if I get an error in the third member of the returned array, does that positively mean that the error refers to the third request I sent in the batch?
I can use the id for succesful requests, but error messages seem general and do not bring any data linked to the request that generated them (unless I'm missing something).
Yes, that's correct.
My strategy is that I create a tracking array as I load up my batch requests. This array correlates the key for my associative array to the numerical order I posted the batches. When I loop over the results, I use a counter to step through the tracking array and pull out the proper associative array index. Then I use that to update the associative array with the results from that step of the batch operation.
It would be nice if batching supported the 'name' parameter and that parameter got returned with each response. But that only appears to work if you're using the name to create batch dependencies:
https://developers.facebook.com/docs/reference/api/batch/
Loading up the batches:
foreach ($campaigns as $title => $campaign) {
if (count($batch) == 20) {
$batches[] = $batch;
$batch = array();
}
$titles[] = $title; #TRACKING array;
$body = http_build_query($campaign);
$body = urldecode($body);
$batch[] = array(
'method' => 'POST',
'relative_url' => "/act_{$act}/adcampaigns",
'body' => $body
);
}
Processing the batches:
if ($batch) {
$batches[] = $batch;
$counter = 0;
foreach ($batches as $batch) {
$params = array(
'access_token' => $access_token,
'batch' => json_encode($batch)
);
$responses = $facebook->api('/', 'POST', $params);
foreach ($responses as $response) {
$response = json_decode($response['body'], 1);
$campaign_id = $response['id'];
$title = $titles[$counter]; #RETRIEVING THE INDEX FROM THE TRACKING ARRAY
$campaigns[$title]['campaign_id'] = $campaign_id;
$counter++; #INCREMENTING THE COUNTER
}
}
}