I've written a Perl script that, put simply, will pull data from an Elasticsearch database and print it out in a specific format.
If any particular snippets of code would be helpful to see in solving my issue, let me know and I'll be more than happy to post them here. I won't cut and paste the whole script as it is almost 1000 lines long & it's my company's property.
Modules currently used: strict, warnings, LWP::UserAgent, CGI, POSIX, JSON, Modern::Perl, Term::ANSIColor, & Scalar::Util
Declarations and such:
# Declare user agent object
my $ua = LWP::UserAgent->new;
# Set custom HTTP request header fields
my $req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
my $post_data = '{
"fields" : [' . $arrayList . '],
"sort" : [
{ "#timestamp" : { "order" : "asc" } }
],
"query" : {
"filtered" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "' . $lowerBound . '",
"lte" : "' . $upperBound . '"
}
}
}
}
}
}';
$arrayList is previously defined to be a string of fields with quotes around them (e.g. "field1","field2","field3").
# Receives results from ES (this is Perl syntax for querying ES)
$req->content( $post_data );
$resp = $ua->request( $req );
$myResults = $resp->content();
#say $myResults; die;
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
#data = #{ $decoded->{ "hits" }{ "hits" } };
#tempResponse = #data;
my $lengthOfArray = scalar #tempResponse;
At this point, #data has the information I need. I've checked it, it looks right. I saved the length of the current response for future use.
$scrollID = $decoded->{ "_scroll_id" };
I save the scroll ID for the next part.
Now that I have the initial set of data, I repeatedly query the database until (well, at least, it's supposed to) the database has been completely queried.
I can tell when the database has been fully queried if $lengthOfArray < 0. If this is true, then there is no more data to get.
while ( $lengthOfArray > 0 ) {
$ua = LWP::UserAgent->new;
$serverEndpoint = "http://localhost:9200/_search/scroll?scroll=1m&scroll_id=" . $scrollID;
$req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
$req->content( $post_data );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$myResults = $resp->content();
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
#tempResponse = #{ $decoded->{ "hits" }{ "hits" } };
#print "#tempResponse\n";
#data = ( #data, #tempResponse );
$lengthOfArray = scalar #tempResponse;
}
The data set I'm working with is enormous. Everything goes perfectly well (I've tested it. If I only let it run for 600 times through the loop, it works no problem) until it gets to loop count #801. At the 801st time through the loop, it gets hung up. It sits there for a good minute or so before dying with the error message:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "read timeout at /usr...")
I've traced down the error to the line that has all the !!!!!s around it above. The loop gets hung up on that line on the 801st time through the loop.
There are no real indicators as to why this error is happening. Again, it works if I go through the loop 800 times, but not 801.
I know all of the data is in Elasticsearch. There are about 12,000 hashes that I should be getting from it. I can access the first 800 hashes via #data[0], #data[1] etc. but after that I'm out of luck.
Any help would be greatly appreciated. I spent my whole 9-hour work day today trying to figure this out with no luck. Really, if you could even ask for clarification on some part of my explanation that might be enough to find the answer.
So, if there's anything I can do to clear up what I've typed (show what the ES URL gives), please let me know.
Thank you very, very much!
==========================================================
EDIT #1: I've found the source of the problem. It doesn't make sense, but here it is.
In $post_data I have $arrayList. It contains a list of 36 fields that I had previously taken from an array of fields. It's formatted like this:
"field1","field2","field3","field4"
I noticed that if I remove ONE of those fields, doesn't matter which one, the request goes through without a problem.
It does not matter which field it is.
==========================================================
EDIT #2: This may be useful. If I let the request time out, it gives the following error message:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "read timeout at /usr..." at *scriptName.pl* line 403, <STDIN> line 5.
Line 403 is as follows:
$decoded = JSON::XS::decode_json( $myResults );
It's right below the
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==========================================================
EDIT #3: I tried printing out $resp->content() to see if anything looks strange before it gets hung up. It doesn't initially ... The content looks just like it should.
After awhile, though, it gives up and prints out the following message:
{"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures{SearchContextMissingException[No search context found for id [2378]]}{SearchContextMissingException[No search context found for id [2379]]}{SearchContextMissingException[No search context found for id [2380]]}","status:404"}
This completely covers the screen until I interrupt it.
==========================================================
FINAL EDIT: IT WORKS!
Nothing really needed to be changed in the declarations section. I did change the size of the query so that less queries had to be made (I think ES's RAM was getting filled). So now this is what the $post_data looks like:
my $post_data = '{
"size": 1000,
"fields" : [' . $arrayList . '],
"sort" : [
{ "#timestamp" : { "order" : "asc" } }
],
"query" : {
"filtered" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "' . $lowerBound . '",
"lte" : "' . $upperBound . '"
}
}
}
}
}
}';
Notice how the size has been changed near the top. I don't think that this was what fixed my issue, but it should help performance, anyway.
The problem I believe I was having was with the while loop. Originally the loop was designed to continue to run until the hits array was empty, but for some reason it kept going even after that. No idea why, maybe I'll figure that out later.
What I did instead was check to see if one of the members that I expect to find within the hits array is defined. If it isn't defined, it fails the loop.
There were a couple other minor changes, but that was the real big one. It works great now! ... only 2 days later.
Thanks Stack Overflow!
every scroll request should use the most recent scroll id
i.e scroll_id returned in previous scroll response.
Looking from the code excerpt looks like you are using the the scroll id from the first response probably changing that to use the most recent scroll_id should help
i.e. in the while block you would need $scrollID = $decoded->{ "_scroll_id" };
while ( $lengthOfArray > 0 ) {
$ua = LWP::UserAgent->new;
$serverEndpoint = "http://localhost:9200/_search/scroll?scroll=1m&scroll_id=" . $scrollID;
$req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
$req->content( $post_data );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$myResults = $resp->content();
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
$scrollID = $decoded->{ "_scroll_id" };
#tempResponse = #{ $decoded->{ "hits" }{ "hits" } };
#print "#tempResponse\n";
#data = ( #data, #tempResponse );
$lengthOfArray = scalar #tempResponse;
}
probably this should help.
Related
I am trying to create a slack application in Perl with mojolicious and I am having the following use case:
Slack sends a request to my API from a slash command and needs a response in a 3 seconds timeframe. However, Slack also gives me the opportunity to send up to 5 more responses in a 30 minute timeframe but still needs an initial response in 3 seconds (it just sends a "late_response_url" in the initial call back so that I could POST something to that url later on). In my case I would like to send an initial response to slack to inform the user that the operation is "running" and after a while send the actual outcome of my slow function to Slack.
Currently, I can do this by spawning a second process using fork() and using one process to respond imidiately as Slack dictates and the second to do the rest of the work and respond later on.
I am trying to do this with Mojolicious' subprocesses to avoid using fork(). However I can't find a way to get this to work....
a sample code of what I am already doing with fork is like this:
sub withpath
{
my $c = shift;
my $user = $c->param('user_name');
my $response_body = {
response_type => "ephemeral",
text => "Running for $user:",
attachments => [
{ text => 'analyze' },
],
};
my $pid = fork();
if($pid != 0){
$c->render( json => $response_body );
}else{
$output = do_time_consuming_things()
$response_body = {
response_type => "in-channel",
text => "Result for $user:",
attachments => [
{ text => $output },
],
};
my $ua = Mojo::UserAgent->new;
my $tx = $ua->post(
$response_url,
{ Accept => '*/*' },
json => $response_body,
);
if( my $res = $tx->success )
{
print "\n success \n";
}
else
{
my $err = $tx->error;
print "$err->{code} response: $err->{message}\n" if $err->{code};
print "Connection error: $err->{message}\n";
}
}
}
So the problem is that no matter how I tried I couldn't replicate the exact same code with Mojolicious' subproccesses. Any ideas?
Thanks in advance!
Actually I just found a solution to my problem!
So here is my solution:
my $c = shift; #receive request
my $user = $c->param('user_name'); #get parameters
my $response_url = $c->param('response_url');
my $text = $c->param('text');
my $response_body = { #create the imidiate response that Slack is waiting for
response_type => "ephemeral",
text => "Running for $user:",
attachments => [
{ text => 'analyze' },
],
};
my $subprocess = Mojo::IOLoop::Subprocess->new; #create the subprocesses
$subprocess->run(
sub {do_time_consuming_things($user,$response_url,$text)}, #this callback is the
#actuall subprocess that will run in background and contains the POST request
#from my "fork" code (with the output) that should send a late response to Slack
sub {# this is a dummy subprocess doing nothing as this is needed by Mojo.
my ($subprocess, $err, #results) = #_;
say $err if $err;
say "\n\nok\n\n";
}
);
#and here is the actual imidiate response outside of the subprocesses in order
#to avoid making the server wait for the subprocess to finish before responding!
$c->render( json => $response_body );
So I actually simply had to put my code of do_time_consuming_things in the first callback (in order for it to run as a subprocess) and use the second callback (that is actually linked to the parent process) as a dummy one and keep my "imidiate" response in the main body of the whole function instead of putting it inside one of the subprocesses. See code comments for more information!
I'm working with some Perl code that I don't understand:
my $tmpdir = "XXX";
my $src = "tmp" . time . int rand 10 . ".jpg";
while ( -s "$tmpdir/$src" ) {
$src = "tmp" . time . int rand 10 . ".jpg";
}
my $ua = LWP::UserAgent->new( keep_alive => 0, timeout => 10 );
my $req = HTTP::Request->new(
"GET",
"https://www.com/act/couponimage/1877",
);
my $res = $ua->request( $req, "$tmpdir/$src" );
if ( ! $res || ! -s "$tmpdir/$src" ) {
header( 301, 0, 0, "https://www.com/" );
exit 0;
}
For some reason it hits the point where it redirects to the main page (the header code).
There is clearly something wrong here because it never executes past the last if clause which redirects. Maybe it has to do with using HTTPS?
It is not a filesystem issue. The following coupon image path works fine
Xxx.com/img/coupon-2600.jpg
The request should be working fine,
but as pointed out, you really should check what version of the packages you are using.
For me ( perl 5.22 - LWP::UserAgent 6.26 ) the call $ua->req( $req ) needs to be $ua->request( $req->uri ). And the file to store the response in must be given with $ua->request( $req->uri, :content_file => $fn ) .
I think the original author assumed the content file would be created by giving the filename to $ua->request. With :content_file, that is what's happening for me. He put the while ( -s $src ) in place to ensure he got a filename that was not yet taken.
Thus, I believe the solution to your problem is to change
my $res = $ua->request($req, "$tmpdir/$src") to
$ua->request($req->uri, ':content_file' => "$tmpdir/$src" );
The docs mention that using :content_file means the actual response content is no longer stored in the object, so you may have to adapt your code.
If all you need is the file, I would adapt the final if to check for the file that should now exist ( which should, I believe, only happen upon successful response ).
If what you need is the $res object to use later, I would probably opt to check the response worked, then print the content to the file you want it in, so just do
my $res = $ua->request( $req->uri );
adapt the if accordingly and print content to file.
I would personally prefer the latter, as I can check the response for success, then print: this seems much clearer.
I'm attempting to use a particular web service, and I can successfully perform the upload with the following command:
curl -X POST --header "Transfer-Encoding: chunked" -d #Downloads/file.pdf https://some.webservice/upload
I get back a json response indicate success.
However, I'm unable to figure out how to do the same with WWW::Mechanize.
$mech->post("https://" . $server . "/upload", Content_Type => 'multipart/form-data', Content => [upID => $upid, name => $dlfile, userID => 0, userK => 0, file_0 => [$dlfile]]);
This receives a similar json response with a big fat error message in it.
Do I need to explicitly set the Transfer-Encoding header first? Is there some other trick to it? Google's not shedding much light on this, Perlmonks neither, and the documentation's a little obtuse.
You can do it using HTTP::Request::StreamingUpload
my $starttime = time();
my $req = HTTP::Request::StreamingUpload->new(
POST => $url,
path => $file,
headers => HTTP::Headers->new(
'Transfer-Encoding' => 'chunked'
),
);
my $gen = $req->content;
die unless ref($gen) eq "CODE";
my $total = 0;
$req->content(sub {
my $chunk = &$gen();
$total += length($chunk);
print "\r$total / $size bytes ("
. int($total/$size*100)
. "%) sent, "
. int($total/1000/(time()-$starttime+1))
. " k / sec ";
return $chunk;
});
my $resp = $ua->request($req);
print "\n";
unless ($resp->is_success) {
die "Failed uploading the file: ", $resp->status_line;
}
my $con = $resp->content;
return $con;
Do you really need WWW::Mechanize? It is a subclass of LWP::UserAgent with additional functionality that gives browser-like functionality like filling in and submitting forms, clicking links, a page history with a "back" operation etc. If you don't need all of that then you may as well use LWP::UserAgent directly
Either way, the post method is inherited unchanged from LWP::UserAgent, and it's fine to use it directly as you have done
The way to send a chunked POST is to set the Content to a reference to a subroutine. The subroutine must return the next chunk of data each time it is called, and finally ann empty string or undef when there is no more to send
Is the data supposed to be a JSON string?
It's easiest to write a factory subroutine that returns a closure, like this
sub make_callback {
my ($data) = shift;
sub { substr($data, 0, 512, "") }
}
Then you can call post like this
my $payload = to_json(...);
$mech->post(
"https://$server/upload",
Content_Type => 'multipart/form-data',
Content => make_callback($payload)
);
Please be aware that all of this is untested
in my stage server I would like to activate the debug so the clients can find errors for themselves before the app goes to the production server.
BUT I only want the first part of the message, not the Request, or the Session Data.
For example: Couldn't render template "templates/home.tt2: file error - templates/inc/heater: not found".
The message is enough for me and for my client to see that the "header" call is misspelled.
The Request has a lot of irrelevant information for the client, but also has A LOT of internal developing information that should be hidden all the time!!
Regards
What you want is to override Catalyst's dump_these method. This returns a list of things to display on Catalyst's error debugging page.
The default implementation looks like:
sub dump_these {
my $c = shift;
[ Request => $c->req ],
[ Response => $c->res ],
[ Stash => $c->stash ],
[ Config => $c->config ];
}
but you can make it more restrictive, for example
sub dump_these {
my $c = shift;
return [ Apology => "We're sorry that you encountered a problem" ],
[ Response => substr($c->res->body, 0, 512) ];
}
You would define dump_these in your app's main module -- the one where you use Catalyst.
I had a similar problem that I solved by overriding the Catalyst method log_request_parameters.
Something like this (as #mob said, put it in your main module):
sub log_request_parameters {
my $c = shift;
my %all_params = #_;
my $copy = Clone::clone(\%all_params); # don't change the 'real' request params
# Then, do anything you want to only print what matters to you,
# for example, to hide some POST parameters:
my $body = $copy->{body} || {};
foreach my $key (keys %$body) {
$body->{$key} = '****' if $key =~ /password/;
}
return $c->SUPER::log_request_parameters( %$copy );
}
But you could also simply return at the beginning, if you don't want any GET/POST parameters displayed.
Well, I didn't think of the more obvious solution, in your case: you could simply set your log level to something higher than debug, which would prevent these debug logs from being displayed, but would keep the error logs:
# (or a similar condition to check you are not on the production server)
if ( !__PACKAGE__->config->{dev} ) {
__PACKAGE__->log->levels( 'warn', 'error', 'fatal' ) if ref __PACKAGE__->log;
}
I am having problems creating an array that I can pass as a form using LWP. Basic code is
my $ua = LWP::UserAgent->new();
my %form = { };
$form->{'Submit'} = '1';
$form->{'Action'} = 'check';
for (my $i=0; $i<1; $i++) {
$form->{'file_'.($i+1)} = [ './test.txt' ];
$form->{'desc_'.($i+1)} = '';
}
$resp = $ua->post('http://someurl/test.php', 'Content_Type' => 'multipart/form-data'
, 'Content => [ \%form ]');
if ($resp->is_success()) {
print "OK: ", $resp->content;
}
} else {
print $claimid->as_string;
}
I guess I am not creating the form array correctly or using the wrong type as when I check the _POST variables in test.php nothing has been set :(
The problem is that for some reason you've enclosed your form values in single quotes. You want to send the data structure. E.g.:
$resp = $ua->post('http://someurl/test.php',
'Content_Type' => 'multipart/form-data',
'Content' => \%form);
You want to either send the hash reference of %form, not the has reference contained within an array reference as you had ([ \%form ]). If you had wanted to send the data as an array reference, then you'd just use[ %form ]` which populates the array with the key/value pairs from the hash.
I'd suggest that you read the documentation for HTTP::Request::Common, the POST section in particular for a cleaner way of doing this.