Adding headers to a LWP post in Perl - perl

I have taken over some perl code and been asked to add a keep-alive header to the LWP post that happens.
Google tells me how to do this for certain setups, but I can't see how to do it for the way this code has been written. All the info I can find works on the basis of creating the LWP object, then creating the POST and parameters, then adding the headers, then actually POSTING the request, however in the code I have to deal with, the creating of the POST, adding the headers and sending are all in one line:
my $ua = LWP::UserAgent->new;
my $response = $ua->post( $URL, ['parm1'=>'val1']);
How/where can I add the headers in this setup, or do I need to re-write as per the examples I have found?

The LWP::UserAgent page tells you how to do this. You would set the handler request_prepare on the user agent object. That will pass you in the request object before it posts.
Actually, anything you put as a list of key-value pairs before the key 'Content' followed by the structure that you want to post, will translate into headers, per HTTP::Request::Common::POST
$ua->post( $URL, keep_alive => 1, Content => ['parm1'=>'val1']);
Or without the content tag, if you put the structure first, you can put header key-value pairs after:
$ua->post( $URL, ['parm1'=>'val1'], keep_alive => 1 );

Did they really asked you to add a keep-alive header only, or did they ask you to support keep alive, e.g. multiple HTTP requests within the same TCP connection. In the latter case you should use (according to the documentation of LWP::UserAgent):
my $ua = LWP::UserAgent->new( keep_alive => 10 );
$ua->get('http://foo.bar/page1');
$ua->get('http://foo.bar/page2'); # reuses connection from previous request
In this case it will keep at most 10 connections open at the same time. If you only do requests do the same site you can also set it to 1 so that it reuses the same TCP connection for all requests.
A Keep-Alive header has no meaning, what keep_alive => 1 within the user agent does is set up a connection cache and add a "Connection: keep-alive" header (with HTTP/1.1 keep-alive is implicite, so it does not need to add the header for HTTP/1.1 requests).

Related

Send HTTP streaming request in perl

I want to send xml request using HTTP streaming protocol . where transfer-encoding is "chunked". Currently i am using LWP::UserAgent to send the xml transaction.
my $userAgent = LWP::UserAgent->new;
my $starttime = time();
my $response = $userAgent->request(POST $url,
Content_Type => 'application/xml',
Transfer_Encoding => 'Chunked',
Content => $xml);
print "Response".Dumper($response);
But i am getting http status code 411 Length Required. Which means "client error response code indicates that the server refuses to accept the request without a defined "
How we can handle this while sending a request in chunked ?
LWP::UserAgent's API isn't designed to send a stream, but it is able to do so with minimal hacking.
use strict;
use warnings qw( all );
use HTTP::Request::Common qw( POST );
use LWP::UserAgent qw( );
my $ua = LWP::UserAgent->new();
# Don't provide any content.
my $request = POST('http://stackoverflow.org/junk',
Content_Type => 'application/xml',
);
# POST() insists on adding a Content-Length header.
# We need to remove it to get a chunked request.
$request->headers->remove_header('Content-Length');
# Here's where we provide the stream generator.
my $buffer = 'abc\n';
$request->content(sub {
return undef if !length($buffer); # Return undef when done.
return substr($buffer, 0, length($buffer), ''); # Return a chunk of data otherwise.
});
my $response = $ua->request($request);
print($response->status_line);
Using a proxy (Fiddler), we can see this does indeed send a chunked request:
There's no point in using a chunked request if you already have the entire document handy like in the example you gave. Instead, let's say wanted to upload the output of some external tool as it produced its output. To do that, you could use the following:
open(my $pipe, '-|:raw', 'some_tool');
$request->content(sub {
my $rv = sysread($pipe, my $buf, 64*1024);
die $! if !defined($rv);
return undef if !$rv;
return $buf;
});
But i am getting http status code 411 Length Required.
Not all servers understand a request with a chunked payload even though this is standardized in HTTP/1.1 (but not in HTTP/1.0). For example nginx only supports chunking within a request since version 1.3.9 (2012), see Is there a way to avoid nginx 411 Content-Length required errors?. If the server does not understand a request with chunked encoding there is nothing you can do from the client side, i.e. you simply cannot use chunked transfer encoding then. If you have control over the server make sure that the server actually supports it.
I've also never experienced browsers send such requests, probably since they cannot guarantee that the server will support such request. I've only seen mobile apps used where the server and app is managed by the same party and thus support for chunked requests can be guaranteed.

Perl get request returns empty response, maybe session related?

I was using an open source tool called SimTT which gets an URL of a tabletennis league and then calculates the probable results (e.g. ranking of teams and players). Unfortunately the webpage moved to a different webpage.
I downloaded the open source and repaired the parsing of the webpage, but currently I'm only able to download the page manually and read it then from a file.
Below you can find an excerpt of my code to retrieve the page. It prints success, but the response is empty. Unfortunately I'm not familiar with perl and webtechniques very well, but in Wireshark I could see that one of the last things send was a new session key. But I'm not sure, if the problem is related to cookies, ssl or something like that.
It would be very nice if someone could help me to get access. I know that there are some people out there which would like to use the tool.
So heres the code:
use LWP::UserAgent ();
use HTTP::Cookies;
my $ua = LWP::UserAgent->new(keep_alive=>1);
$ua->agent('Mozilla/5.0');
$ua->cookie_jar({});
my $request = new HTTP::Request('GET', 'https://www.mytischtennis.de/clicktt/ByTTV/18-19/ligen/Bezirksoberliga/gruppe/323819/mannschaftsmeldungen/vr');
my $response = $ua->request($request);
if ($response->is_success) {
print "Success: ", $response->decoded_content;
}
else {
die $response->status_line;
}
Either there is some rudimentary anti-bot protection at the server or the server is misconfigured or otherwise broken. It looks like it expects to have an Accept-Encoding header in the request which LWP by default does not sent. The value of this header does not really seem to matter, i.e. the server will send the content compressed with gzip if the client claims to support it but it will send uncompressed data if the client offered only a compression method which is unknown to the server.
With this knowledge one can change the code like this:
my $request = HTTP::Request->new('GET',
'https://www.mytischtennis.de/clicktt/ByTTV/18-19/ligen/Bezirksoberliga/gruppe/323819/mannschaftsmeldungen/vr',
[ 'Accept-Encoding' => 'foobar' ]
);
With this simple change the code works currently for me. Note that it might change at any time if the server setup will be changed, i.e. it might then need other workarounds.

Guzzle not sending PSR-7 POST body correctly

It is either not being sent, or not being received correctly. Using curl direct from the command line (using the -d option) or from PHP (using CURLOPT_POSTFIELDS) does work.
I start with a PSR-7 request:
$request = GuzzleHttp\Psr7\Request('POST', $url);
I add authentication header, which authenticates against the API correctly:
$request = $request->withHeader('Authorization', 'Bearer ' . $accessToken);
Then I add the request body:
// The parameter for the API function
$body = \GuzzleHttp\Psr7\stream_for('args=dot');
$request = $request->withBody($body);
I can send the message to the API:
$client = new \GuzzleHttp\Client();
$response = $client->send($request, ['timeout' => 2]);
The response I get back indicates that the "args" parameter was simply not seen by the API. I have tried moving the authentication token to the args:
'args=dot&access_token=123456789'
This should work, and does work with curl from the command line (-d access_token=123456789) but the API fails to see that parameter also when sending cia curl (6.x) as above.
I can see the message does contain the body:
var_dump((string)$request->getBody());
// string(8) "args=dot"
// The "=" is NOT URL-encoded in any way.
So what could be going wrong here? Are the parameters not being sent, or are they being sent in the wrong format (maybe '=' is being encoded?), or is perhaps the wrong content-type being used? It is difficult to see what is being sent "on the wire" when using Guzzle, since the HTTP message is formatted and sent many layer deep.
Edit: Calling up a local test script instead of the remote API, I get this raw message detail:
POST
CONNECTION: close
CONTENT-LENGTH: 62
HOST: acadweb.co.uk
USER-AGENT: GuzzleHttp/6.1.1 curl/7.19.7 PHP/5.5.9
args=dot&access_token=5e09d638965288937dfa0ca36366c9f8a44d4f3e
So it looks like the body is being sent, so I guess something else is missing to tell the remote API how to interpret that body.
Edit: the command-line curl that does work, sent to the same test script, gives me two additional header fields in the request:
CONTENT-TYPE: application/x-www-form-urlencoded
ACCEPT: */*
I'm going to guess it is the content-type header which is missing from the Guzzle request which is the source of the problem. So is this a Guzzle bug? Should it not always sent a Content-Type, based on the assumptions it makes that are listed in the documentation?
The Content-Type header was the issue. Normally, Guzzle will hold your hand and insert headers it deems necessary, and makes a good guess at the Content-Type based on what you have given it, and how you have given it.
With Guzzle's PSR-7 messages, none of that hand-holding is done. It strictly leaves all the headers for you to handle. So when adding POST parameters to a PSR-7 Request, you must explicitly set the Content-Type:
$params = ['Foo' => 'Bar'];
$body = \GuzzleHttp\Psr7\stream_for(http_build_query($params));
$request = $request->withBody($body);
$request = $request->withHeader('Content-Type', 'application/x-www-form-urlencoded');
The ability to pass in the params as an array and to leave Guzzle to work out the rest, does not apply to Guzzle's PSR-7 implementation. It's a bit clumsy, as you need to serialise the POST parameters into a HTTP query string, and then stick that into a stream, but there you have it. There may be an easier way to handle this (e.g. a wrapper class I'm not aware of), and I'll wait and see if any come up before accepting this answer.
Be aware also that if constructing a multipart/form-data Request message, you need to add the boundary string to the Content-Type:
$request = $request->withHeader('Content-Type', 'multipart/form-data; boundary=' . $boundary);
Where $boundary can be something like uniq() and is used in construction the multipart body.
The GuzzleHttp\Client provides all necessary wrapping.
$response = $client->post(
$uri,
[
'auth' => [null, 'Bearer ' . $token],
'form_params' => $parameters,
]);
Documentation available Guzzle Request Options
Edit: However, if your requests are being used within GuzzleHttp\Pool then, you can simply everything into the following:
$request = new GuzzleHttp\Psr7\Request(
'POST',
$uri,
[
'Authorization' => 'Bearer ' . $token,
'Content-Type' => 'application/x-www-form-urlencoded'
],
http_build_query($form_params, null, '&')
);

How to detect a changed webpage?

In my application, I fetch webpages periodically using LWP. Is there anyway to check whether between two consecutive fetches the webpage has got changed in some respect (other than explicitly doing a comparison) ? Is there any signature(say CRC) that is being generated at lower protocol layers which can be extracted and compared against older signatures to see possible changes ?
There are two possible approaches. One is to use a digest of the page, e.g.
use strict;
use warnings;
use Digest::MD5 'md5_hex';
use LWP::UserAgent;
# fetch the page, etc.
my $digest = md5_hex $response->decoded_content;
if ( $digest ne $saved_digest ) {
# the page has changed.
}
Another option is to use an HTTP ETag, if the server provides one for the resource requested. You can simply store it and then set your request headers to include an If-None-Match field on subsequent requests. If the server ETag has remained the same, you'll get a 304 Not Modified status and an empty response body. Otherwise you'll get the new page. (And new ETag.) See Entity Tags in RFC2616.
Of course, the server could be lying, and sending the same ETag even though the content has changed. There's no way to know unless you look.
You should use the If-Modified-Since request header, noting the gotchas in the RFC. You send this header with the request. If the server supports it and thinks the content is newer, it sends it to you. If it thinks you have the most recent version, it returns a 304 with no message body.
However, as other answers have noted, the server doesn't have to tell you the truth, so you're sometimes stuck downloading the content and checking yourself. Many dynamic things will always claim to have new content because many developers have never thought about supporting basic HTTP things in their web apps.
For the LWP bits, you can create a single request with an extra header:
use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new( GET => $url );
$r->header( 'If-Modified-Since' => $time );
$ua->request( $request );
For all requests, you can set a request handler:
$ua->add_handler(
request_send => sub {
my($request, $ua, $h) = #_;
# ... look up time from local store
$r->header( 'If-Modified-Since' => $time );
}
);
However, LWP can do most of this for you with mirror if you want to save the files:
$ua->mirror( $url, $filename )

Why does some header information in a CGI.pm object persist while another does not?

My colleagues and I are maintaining and developing a Perl web-project that works via mod_perl.
Now we are going through a major legacy code refactoring in which we have implemented some sort of an MVC pattern.
Among other things, my task is to make sure that all HTTP response headers are processed and sent back to the browser inside the main controller. For example, if a redirect is required, a page handler throws an exception, then the main controller catches it and generates the corresponding headers.
It all looked well until I started to implement cookie handling. Before that our code just printed cookie headers to output when it was required, like so:
# $response is an instance of the CGI class
print $response->redirect(
-uri => "/some_uri/",
-cookie => $response->cookie(
-name => 'user_id',
-value => $user->{'id'},
-path => '/', -expires => '+1M'));
And now I want the $response object to store that information, so I can later send all headers together. I thought that it would go something like that:
sub page_handler {
# ...
$response->cookie(-name => 'user_id',
-value => $user->{'id'},
-path => '/', -expires => '+1M');
return;
}
# And then, inside the controller
sub controller {
# ...
# the same $response instance
print $response->header();
print $output;
# ....
exit();
}
But it seems that the CGI class object doesn't store all headers that it creates with the header method. Some headers seem to persist, while others do not, here is what I get in re.pl:
$ use CGI;
$ my $response = CGI->new();
$CGI1 = CGI=HASH(0xa6efba0);
$ $response->header();
Content-Type: text/html; charset=ISO-8859-1
$ $response->header(-type => 'text/plain', -charset => 'UTF-8', -status => '200 OK');
Status: 200 OK
Content-Type: text/plain; charset=UTF-8
$ $response->header();
Content-Type: text/html; charset=UTF-8
I expected the last output to be either the same as the previous one, or the same as the first one, where I have not yet set any headers. I did not expect it to change partially.
That is why I ask my question: Why does some header information in a CGI.pm object persist while another does not?
Am I using the object incorrectly? Is there a way I could use it the way I intended to?
PS: Sorry for the long question, I wanted to make sure you understand what I want to do.
PPS: Also, I know that many people around here recommend going away from CGI and using Catalyst. This is, I am afraid, not an option right now, because we have too much legacy code, and we are hoping to get away from mod_perl altogether. This is required only for a certain feature.
To answer your question, the header method doesn't store any information, nothing is persistent.
With your example of the header 'object' persisting, reading TFM helps:
The -charset parameter can be used to control the character set sent to the browser. If not provided, defaults to ISO-8859-1. As a side effect, this sets the charset() method as well. [emphasis mine]
After you call header with some parameters, then call it as default, the only thing to 'persist' is the character set.
For your cookie problem, i think you'd have to store $response->cookie(); somewhere. TFM doesn't say that the cookie() sub stores the data anywhere, it just says that it creates a cookie.
I agree with Sinan though - throwing exceptions is crazy talk, especially to cover CGI.pm's redirect sub. I'd rethink that one. Or go completely the other way and write the whole webapp only using exception handling - there's be some good laughs along the way :o)