I'm working with some Perl code that I don't understand:
my $tmpdir = "XXX";
my $src = "tmp" . time . int rand 10 . ".jpg";
while ( -s "$tmpdir/$src" ) {
$src = "tmp" . time . int rand 10 . ".jpg";
}
my $ua = LWP::UserAgent->new( keep_alive => 0, timeout => 10 );
my $req = HTTP::Request->new(
"GET",
"https://www.com/act/couponimage/1877",
);
my $res = $ua->request( $req, "$tmpdir/$src" );
if ( ! $res || ! -s "$tmpdir/$src" ) {
header( 301, 0, 0, "https://www.com/" );
exit 0;
}
For some reason it hits the point where it redirects to the main page (the header code).
There is clearly something wrong here because it never executes past the last if clause which redirects. Maybe it has to do with using HTTPS?
It is not a filesystem issue. The following coupon image path works fine
Xxx.com/img/coupon-2600.jpg
The request should be working fine,
but as pointed out, you really should check what version of the packages you are using.
For me ( perl 5.22 - LWP::UserAgent 6.26 ) the call $ua->req( $req ) needs to be $ua->request( $req->uri ). And the file to store the response in must be given with $ua->request( $req->uri, :content_file => $fn ) .
I think the original author assumed the content file would be created by giving the filename to $ua->request. With :content_file, that is what's happening for me. He put the while ( -s $src ) in place to ensure he got a filename that was not yet taken.
Thus, I believe the solution to your problem is to change
my $res = $ua->request($req, "$tmpdir/$src") to
$ua->request($req->uri, ':content_file' => "$tmpdir/$src" );
The docs mention that using :content_file means the actual response content is no longer stored in the object, so you may have to adapt your code.
If all you need is the file, I would adapt the final if to check for the file that should now exist ( which should, I believe, only happen upon successful response ).
If what you need is the $res object to use later, I would probably opt to check the response worked, then print the content to the file you want it in, so just do
my $res = $ua->request( $req->uri );
adapt the if accordingly and print content to file.
I would personally prefer the latter, as I can check the response for success, then print: this seems much clearer.
Related
I'm attempting to use a particular web service, and I can successfully perform the upload with the following command:
curl -X POST --header "Transfer-Encoding: chunked" -d #Downloads/file.pdf https://some.webservice/upload
I get back a json response indicate success.
However, I'm unable to figure out how to do the same with WWW::Mechanize.
$mech->post("https://" . $server . "/upload", Content_Type => 'multipart/form-data', Content => [upID => $upid, name => $dlfile, userID => 0, userK => 0, file_0 => [$dlfile]]);
This receives a similar json response with a big fat error message in it.
Do I need to explicitly set the Transfer-Encoding header first? Is there some other trick to it? Google's not shedding much light on this, Perlmonks neither, and the documentation's a little obtuse.
You can do it using HTTP::Request::StreamingUpload
my $starttime = time();
my $req = HTTP::Request::StreamingUpload->new(
POST => $url,
path => $file,
headers => HTTP::Headers->new(
'Transfer-Encoding' => 'chunked'
),
);
my $gen = $req->content;
die unless ref($gen) eq "CODE";
my $total = 0;
$req->content(sub {
my $chunk = &$gen();
$total += length($chunk);
print "\r$total / $size bytes ("
. int($total/$size*100)
. "%) sent, "
. int($total/1000/(time()-$starttime+1))
. " k / sec ";
return $chunk;
});
my $resp = $ua->request($req);
print "\n";
unless ($resp->is_success) {
die "Failed uploading the file: ", $resp->status_line;
}
my $con = $resp->content;
return $con;
Do you really need WWW::Mechanize? It is a subclass of LWP::UserAgent with additional functionality that gives browser-like functionality like filling in and submitting forms, clicking links, a page history with a "back" operation etc. If you don't need all of that then you may as well use LWP::UserAgent directly
Either way, the post method is inherited unchanged from LWP::UserAgent, and it's fine to use it directly as you have done
The way to send a chunked POST is to set the Content to a reference to a subroutine. The subroutine must return the next chunk of data each time it is called, and finally ann empty string or undef when there is no more to send
Is the data supposed to be a JSON string?
It's easiest to write a factory subroutine that returns a closure, like this
sub make_callback {
my ($data) = shift;
sub { substr($data, 0, 512, "") }
}
Then you can call post like this
my $payload = to_json(...);
$mech->post(
"https://$server/upload",
Content_Type => 'multipart/form-data',
Content => make_callback($payload)
);
Please be aware that all of this is untested
I've written a Perl script that, put simply, will pull data from an Elasticsearch database and print it out in a specific format.
If any particular snippets of code would be helpful to see in solving my issue, let me know and I'll be more than happy to post them here. I won't cut and paste the whole script as it is almost 1000 lines long & it's my company's property.
Modules currently used: strict, warnings, LWP::UserAgent, CGI, POSIX, JSON, Modern::Perl, Term::ANSIColor, & Scalar::Util
Declarations and such:
# Declare user agent object
my $ua = LWP::UserAgent->new;
# Set custom HTTP request header fields
my $req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
my $post_data = '{
"fields" : [' . $arrayList . '],
"sort" : [
{ "#timestamp" : { "order" : "asc" } }
],
"query" : {
"filtered" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "' . $lowerBound . '",
"lte" : "' . $upperBound . '"
}
}
}
}
}
}';
$arrayList is previously defined to be a string of fields with quotes around them (e.g. "field1","field2","field3").
# Receives results from ES (this is Perl syntax for querying ES)
$req->content( $post_data );
$resp = $ua->request( $req );
$myResults = $resp->content();
#say $myResults; die;
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
#data = #{ $decoded->{ "hits" }{ "hits" } };
#tempResponse = #data;
my $lengthOfArray = scalar #tempResponse;
At this point, #data has the information I need. I've checked it, it looks right. I saved the length of the current response for future use.
$scrollID = $decoded->{ "_scroll_id" };
I save the scroll ID for the next part.
Now that I have the initial set of data, I repeatedly query the database until (well, at least, it's supposed to) the database has been completely queried.
I can tell when the database has been fully queried if $lengthOfArray < 0. If this is true, then there is no more data to get.
while ( $lengthOfArray > 0 ) {
$ua = LWP::UserAgent->new;
$serverEndpoint = "http://localhost:9200/_search/scroll?scroll=1m&scroll_id=" . $scrollID;
$req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
$req->content( $post_data );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$myResults = $resp->content();
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
#tempResponse = #{ $decoded->{ "hits" }{ "hits" } };
#print "#tempResponse\n";
#data = ( #data, #tempResponse );
$lengthOfArray = scalar #tempResponse;
}
The data set I'm working with is enormous. Everything goes perfectly well (I've tested it. If I only let it run for 600 times through the loop, it works no problem) until it gets to loop count #801. At the 801st time through the loop, it gets hung up. It sits there for a good minute or so before dying with the error message:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "read timeout at /usr...")
I've traced down the error to the line that has all the !!!!!s around it above. The loop gets hung up on that line on the 801st time through the loop.
There are no real indicators as to why this error is happening. Again, it works if I go through the loop 800 times, but not 801.
I know all of the data is in Elasticsearch. There are about 12,000 hashes that I should be getting from it. I can access the first 800 hashes via #data[0], #data[1] etc. but after that I'm out of luck.
Any help would be greatly appreciated. I spent my whole 9-hour work day today trying to figure this out with no luck. Really, if you could even ask for clarification on some part of my explanation that might be enough to find the answer.
So, if there's anything I can do to clear up what I've typed (show what the ES URL gives), please let me know.
Thank you very, very much!
==========================================================
EDIT #1: I've found the source of the problem. It doesn't make sense, but here it is.
In $post_data I have $arrayList. It contains a list of 36 fields that I had previously taken from an array of fields. It's formatted like this:
"field1","field2","field3","field4"
I noticed that if I remove ONE of those fields, doesn't matter which one, the request goes through without a problem.
It does not matter which field it is.
==========================================================
EDIT #2: This may be useful. If I let the request time out, it gives the following error message:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "read timeout at /usr..." at *scriptName.pl* line 403, <STDIN> line 5.
Line 403 is as follows:
$decoded = JSON::XS::decode_json( $myResults );
It's right below the
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==========================================================
EDIT #3: I tried printing out $resp->content() to see if anything looks strange before it gets hung up. It doesn't initially ... The content looks just like it should.
After awhile, though, it gives up and prints out the following message:
{"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures{SearchContextMissingException[No search context found for id [2378]]}{SearchContextMissingException[No search context found for id [2379]]}{SearchContextMissingException[No search context found for id [2380]]}","status:404"}
This completely covers the screen until I interrupt it.
==========================================================
FINAL EDIT: IT WORKS!
Nothing really needed to be changed in the declarations section. I did change the size of the query so that less queries had to be made (I think ES's RAM was getting filled). So now this is what the $post_data looks like:
my $post_data = '{
"size": 1000,
"fields" : [' . $arrayList . '],
"sort" : [
{ "#timestamp" : { "order" : "asc" } }
],
"query" : {
"filtered" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "' . $lowerBound . '",
"lte" : "' . $upperBound . '"
}
}
}
}
}
}';
Notice how the size has been changed near the top. I don't think that this was what fixed my issue, but it should help performance, anyway.
The problem I believe I was having was with the while loop. Originally the loop was designed to continue to run until the hits array was empty, but for some reason it kept going even after that. No idea why, maybe I'll figure that out later.
What I did instead was check to see if one of the members that I expect to find within the hits array is defined. If it isn't defined, it fails the loop.
There were a couple other minor changes, but that was the real big one. It works great now! ... only 2 days later.
Thanks Stack Overflow!
every scroll request should use the most recent scroll id
i.e scroll_id returned in previous scroll response.
Looking from the code excerpt looks like you are using the the scroll id from the first response probably changing that to use the most recent scroll_id should help
i.e. in the while block you would need $scrollID = $decoded->{ "_scroll_id" };
while ( $lengthOfArray > 0 ) {
$ua = LWP::UserAgent->new;
$serverEndpoint = "http://localhost:9200/_search/scroll?scroll=1m&scroll_id=" . $scrollID;
$req = HTTP::Request->new( POST => $serverEndpoint );
$req->header( 'content-type' => 'application/json' );
$req->content( $post_data );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$resp = $ua->request( $req );
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
$myResults = $resp->content();
# Changes string (as returned by http request) to JSON format that is compatible with Perl
$decoded = JSON::XS::decode_json( $myResults );
$scrollID = $decoded->{ "_scroll_id" };
#tempResponse = #{ $decoded->{ "hits" }{ "hits" } };
#print "#tempResponse\n";
#data = ( #data, #tempResponse );
$lengthOfArray = scalar #tempResponse;
}
probably this should help.
I have a CGI server side script that accepts GET and POST, with login parameters.
I want to test it to make sure it is not vulnerable. So the plan is to use Perl LWP, and send login parameters in GET and POST, and compare the results. the interface has been changed, so that only in POST we can send user-name and password in session cookies ( not sure if that is a great idea ) , so how do i test it ? Here is what i have so far:
#!/usr/bin/perl
use LWP;
print "This is libwww-perl-$LWP::VERSION\n";
# Create a user agent object
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
# Create a request
#my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
#my $req = HTTP::Request->new(GET => 'https://qa.co.net:443/cgi-bin/n-cu.cgi');
my $req = HTTP::Request->new(GET => 'https://qa.co.net:443/cgi-bin/n-cu.cgi?mode=frameset&JScript=1&remote_user&login=foo&password=foo HTTP/1.1');
$req->content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist');
# Pass request to the user agent and get a response back
my $res = $ua->request($req);
# Check the outcome of the response
if ($res->is_success) {
print $res->content;
#print $res->code;
#print $res->message;
}
else {
print $res->status_line, "\n";
}
This is not going to do it, since it does not have the session cookie stuff. But might be a good start though. Is this the right way to test the GET and POST ?
Here is what was implemented in the cgi:
#cr_login for POST && login for GET -- leave GET param as it used to be.
if ($m eq 'GET' && defined($req->param('login'))) {
$msg = 'parameter "login" is invalid for this request type.';
+ my $seclog = $event_logging_directory . '/invalid_request.log';
+ open(S, ">>$seclog") or die $!;
+ my $logmsg = sprintf("%4d-%02d-%02d %02d:%02d:%02d",Today_and_Now())
+ . "|mode:" . $req->param('mode')
+ . "|login:" . $req->param('login')
+ . "|remote_addr:" . $ENV{REMOTE_ADDR}
+ . "|$msg\n";
+ print S $logmsg;
and :
POST request to n-cu.cgi should use parameter "cr_login". If the parameter "login" is passed in a post request, it should throw error and return to login screen.
GET request to n-cu.cgi should use the parameter "login". If the parameter "cr_login" is passed in a post request, it should throw error and return to login screen.
so here is how we do it:
Keep the session cookie and context alive :
my $browser = LWP::UserAgent->new(keep_alive => 10);
$browser->cookie_jar( {} );
$browser->agent('Mozilla/8.0');
#$browser->ssl_opts({ verify_hostname => 0 });
$browser->show_progress(1);
and later: print the response
print "Cookies:\n", Dumper($browser->cookie_jar()), "\n\n";
my $content = $response->as_string;
print "$content\n";
Sending password in a cookie? Nope.
Disallow GET for /login.
POST username and password to /login, over SSL.
In CGI, the GET/POST is indicated via the REQUEST_METHOD environment variable.
You cannot stop determined people from issuing a GET request to your server, but you can refuse to process it like so (untested code - you have to fill in details):
if ($ENV{REQUEST_METHOD} ne 'POST') {
# issue a redirect to a suitable error page, then return.
}
my $q = CGI->new();
my $user = $q->params('username');
my $password = $q->params('password');
my $encrypted_password = my_password_encryptor($password);
unless ( can_log_in($user, $encrypted_password) ) {
# issue an error message - redirect&return or fall-through...
}
else {
$session->set_user_logged_in();
}
Most people do not roll their own authentication or session handling. They mostly use one from CPAN, or one included with the larger app framework. If you're doing CGI, you can use CGI::Session.
You might give CGI::Application and/or its offspring a look. Those authors have already solved a bunch of the problems that you're encountering.
I use WWW::Curl to upload files:
use WWW::Curl::Easy 4.14;
use WWW::Curl::Form;
my $url = 'http://example.com/backups/?sid=12313qwed323';
my $params = {
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
};
my $form = WWW::Curl::Form->new();
foreach my $k (keys %{$params}) {
if (ref $params->{$k}) {
$form->formaddfile(#{$params->{$k}}[0], $k, 'multipart/form-data');
} else {
$form->formadd($k, $params->{$k});
}
}
my $curl = WWW::Curl::Easy->new() or die $!;
$curl->setopt(CURLOPT_HTTPPOST, $form);
$curl->setopt(CURLOPT_URL, $url);
my $body;
$curl->setopt(CURLOPT_WRITEDATA, \$body);
my $retcode = $curl->perform();
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
nothing special here and this code works well.
I want to upload large files and I don't want to preload everything in the memory. At least that is what I heard that libcurl is doing.
CURLOPT_READFUNCTION accepts callbacks which returns parts of the content. That means that I cannot use WWW::Curl::Form to set POST parameters but that I have to return the whole content through this callback. Is that right?
I think that the code could look like this:
use WWW::Curl::Easy 4.14;
my $url = 'http://example.com/backups/?sid=12313qwed323'
my $params = {
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
};
my $fields;
foreach my $k (keys %{$params}) {
$fields .= "$k=".(ref $params->{$k} ? '#'.#{$params->{$k}}[0] : uri_escape_utf8($params->{$k}))."&";
}
chop($fields);
my $curl = WWW::Curl::Easy->new() or die $!;
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_POSTFIELDS, $fields); # is it needed with READFUNCTION??
$curl->setopt(CURLOPT_URL, $url);
my #header = ('Content-type: multipart/form-data', 'Transfer-Encoding: chunked');
$curl->setopt(CURLOPT_HTTPHEADER, \#header);
#$curl->setopt(CURLOPT_INFILESIZE, $size);
$curl->setopt(CURLOPT_READFUNCTION, sub {
# which data to return here?
# $params (without file) + file content?
return 0;
});
Which data does CURLOPT_READFUNCTION callback have to return? $params + File(s) content? In which format?
Do I really have to create the data (returned by CURLOPT_READFUNCTION) by myself or is there a simple way to create it in the right format?
Thanks
Test 16formpost.t is relevant. As you can see, it's completely disabled. This fact and my fruitless experiments with various return values for the callback function lets me believe the CURLOPT_READFUNCTION feature is known broken in the Perl binding.
I have to return the whole content through this callback. Is that right?
No, you can feed it the request body piecewise, suitable for chunked encoding. The callback will be necessarily called several times, according to the limit set in CURLOPT_INFILESIZE.
Which data does CURLOPT_READFUNCTION callback have to return?
A HTTP request body. Since you do a file upload, this means Content-Type multipart/form-data. Following is an example using HTTP::Message. CURLOPT_HTTPPOST is another way to construct this format.
use HTTP::Request::Common qw(POST);
use WWW::Curl::Easy 4.14;
my $curl = WWW::Curl::Easy->new or die $!;
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_URL, 'http://localhost:5000');
$curl->setopt(CURLOPT_HTTPHEADER, [
'Content-type: multipart/form-data', 'Transfer-Encoding: chunked'
]);
$curl->setopt(CURLOPT_READFUNCTION, sub {
return POST(undef, Content_Type => 'multipart/form-data', Content => [
name => 'upload',
action => 'keep',
backup1 => [ '/tmp/backup1.zip' ], # 1st file for upload
])->content;
});
my $r = $curl->perform;
The CURLOPT_READFUNCTION callback is only used for chunked tranfer encoding. It may work, but I haven't been able to get it to and found that doing so wasn't necessary anyway.
My use case was for upload of data to AWS, where it's not ok to upload the data as multi-part form data. Instead, it's a straight POST of the data. It does require that you know how much data you're sending the server, though. This seems to work for me:
my $infile = 'file-to-upload.json';
my $size = -s $infile;
open( IN, $infile ) or die("Cannot open file - $infile. $! \n");
my $curl = WWW::Curl::Easy->new;
$curl->setopt(CURLOPT_HEADER, 1);
$curl->setopt(CURLOPT_NOPROGRESS, 1);
$curl->setopt(CURLOPT_POST, 1);
$curl->setopt(CURLOPT_URL, $myPostUrl);
$curl->setopt(CURLOPT_HTTPHEADER,
['Content-Type: application/json']); #For my use case
$curl->setopt(CURLOPT_POSTFIELDSIZE_LARGE, $size);
$curl->setopt(CURLOPT_READDATA, \*IN);
my $retcode = $curl->perform;
if ($retcode == 0) {
print("File upload success\n");
}
else {
print("An error happened: $retcode ".$curl->strerror($retcode)."\n");
}
The key is providing an open filehandle reference to CURLOPT_READDATA. After that, the core curl library handles the reads from it without any need for callbacks.
I'm trying to use Perl's WWW::Mechanize to download a file. I have to login the website before and then, after having validated the form, download the file.
The thing is, after hours, I didn't succeed doing what I want. At the end, the script save a file which is not a zip file but a html file with nothing interesting in it.
Here is the script I've done :
use WWW::Mechanize;
use Crypt::SSLeay;
my $login = "MyMail";
my $password = "MyLogin";
my $url = 'http://www.lemonde.fr/journalelectronique/donnees/protege/20101002/Le_Monde_20101002.zip';
$bot = WWW::Mechanize->new();
$bot->cookie_jar(
HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
ignore_discard => 1,
)
);
$response = $bot->get($url);
$bot->form_name("formulaire");
$bot->field('login', $login);
$bot->field('password', $password);
$bot->submit();
$response = $bot->get($url);
my $filename = $response->filename;
if (! open ( FOUT, ">$filename" ) ) {
die("Could not create file: $!" );
}
print( FOUT $bot->response->content() );
close( FOUT );
Could you help me finding what mistakes I've done?
There are some hidden input fields which I assume are filled in when you navigate to the download using a browser rather than using a URL directly.
In addition, they are setting some cookies via JavaScript and those would not be picked up by Mechanize. However, there is a plugin WWW::Mechanize::Plugin::JavaScript which might be able to help you with that (I have no experience with it).
Use LiveHTTPHeaders to see what gets submitted by the browser and replicate that (assuming you are not violating their TOS).
The problem you mention is well known in Mechanize. The simplest solution is to use the Raspo library.