Send a plain string request with LWP - perl

To get a response from a certain website, I have to give one exact request string, HTTP/1.1. I tried that one with telnet, it gives me the response I want (a redirect, but I need it).
But when I try to give the same request string to HTTP::Request->parse(), I merely get the message 400 URL must be absolute.
I am not sure if it's the website or LWP giving me that, because as I said, the response worked with telnet.
This is the code:
my $req = "GET / HTTP/1.1\n".
"Host: www.example-site.de\n".
"User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1\n".
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\n".
"Accept-Language: en-us,en;q=0.5\n".
"Accept-Encoding: gzip, deflate\n".
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\n".
"Keep-Alive: 115\n".
"Connection: keep-alive\n";
# Gives correct request string
print HTTP::Request->parse($req)->as_string;
my $ua = LWP::UserAgent->new( cookie_jar => {}, agent => '' );
my $response = $ua->request(HTTP::Request->parse($req));
# 400 error
print $response->as_string,"\n";
Anyone can help me here?

LWP::UserAgent dies with the error you are getting if there is no schema specified in request. It probably need it to properly work with it.
So, to make it work, you need to specify full url for your request:
my $req_str = "GET http://www.example.de/\n".
"User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1\n".
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\n".
"Accept-Language: en-us,en;q=0.5\n".
"Accept-Encoding: gzip, deflate\n".
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\n".
"Keep-Alive: 115\n".
"Connection: keep-alive\n";

Ok, I did it using Sockets. After all, I had the HTTP request and wanted the plain response. Here the code for people who are interested:
use IO::Sockets;
my $sock = IO::Socket::INET->new(
PeerAddr => 'www.example-site.de',
PeerPort => 80,
Proto => 'Tcp',
);
die "Could not create socket: $!\n" unless $sock;
print $sock, $req;
while(<$sock>) {
# Look for stuff I need
}
close $sock;
It's just important to remember to leave the while, as the HTTP response won't end with an EOF.

It looks to me like parsing the request isn't 100 % round-trip-safe, meaning you cannot feed the response back into a request.
Looks like a bug at first sight, but the module's been out for such a long time… On the other hand, I didn't even know you could use this module to parse a request, so maybe it's not so well tested.
The following test case should point you to the problem, which is that the URL isn't properly assembled for being fed to the $req->request method.
use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Request;
use Test::More;
my $host = 'www.example.com';
my $url = '/bla.html';
my $req = <<"EOS";
GET $url HTTP/1.1
Host: $host
EOS
# (1) parse the request
my $reqo = HTTP::Request->parse($req);
isa_ok $reqo, 'HTTP::Request';
diag explain $reqo;
diag $reqo->as_string;
# (2) construct the request
my $reqo2 = HTTP::Request->new( GET => "http://$host$url" );
isa_ok $reqo2, 'HTTP::Request';
diag explain $reqo2;
diag $reqo2->as_string;
is $reqo->uri, $reqo2->uri, 'both URLs are identical';
my $ua = LWP::UserAgent->new( cookie_jar => {}, agent => '' );
for ( $reqo, $reqo2 ) {
my $response = $ua->request( $_ );
diag $response->as_string,"\n";
}
done_testing;

Related

Read a web page with Perl

I am trying to read the content of a web page with perl on Windows 10. The code does not work for the following site:
https://www.dividendinvestor.com/dividend-quote/intc/
Here is the code I am using:
use LWP::Simple qw(get);
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
Any idea why I cannot read that page?
LWP::Simple is pretty basic and doesn't let you do anything clever like actually looking at the details of the response. So let's change to LWP::UserAgent and see what the response is.
use LWP::UserAgent;
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
my $resp = $ua->get($url);
print $resp->status_line;
This prints:
403 Forbidden
So I think that Quentin's comment is correct and that the site's owners are blocking people who use technology like LWP.
So let's change the useragent string to look like Internet Explorer.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
print $resp->status_line;
Now I get:
200 OK
So we should be ok to get the content.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
if ($resp->is_success) {
print $resp->content;
} else {
print $resp->status_line;
}
And that seems to work fine.
Note: Of course, changing the useragent string like this is rather dishonest. Presumably, the site's owners have a good reason for wanting to dissuade people from accessing their site in this way. So don't annoy them by trying to get around their restrictions. Read the site'sterms of service to see what they want to to do. Perhaps they have an API available that will give you the data you want.
As Dave Cross wrote, the problem is related to the user agent. It is possible to use the LWP::Simple module in this way:
use LWP::Simple qw/$ua get/;
$ua->agent('Mozilla/5.0');
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
As the documentation points, the user agent created by this module (LWP::Simple) will identify itself as "LWP::Simple/#.##". So we can change it before the "GET" request.

WWW::Mechanize ssl cross-site login failure cookie_jar not populating

The URL $url, redirects to https://auth.outside.com/secure/login for authentication over SSL. The site stores some cookie, as soon as you land on the page, and also some on successful authentication. However, I am not getting the cookie file populated, even when i manage to land on the page. this is an example with google, but real URL is different.
CODE
#!/usr/bin/perl
use warnings;
use strict;
use WWW::Mechanize;
use Crypt::SSLeay;
use HTTP::Cookies;
my $userAgent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0';
my $cookie_file = 'auth_cookies.txt';
$ENV{HTTPS_PROXY} = 'http://myproxy.net:8080';
my $google='https://www.google.com';
my $url = $google;
my $tempfile='download_details';
my $mech = WWW::Mechanize->new(
noproxy => 0,
agent => $userAgent,
cookie_jar => HTTP::Cookies->new( file => $cookie_file )
);
my $result=$mech->get( $url, ':content_file' => $tempfile );
print sprintf( "User-Agent %s\n redirects to: %s\n\n", $userAgent, $mech->uri() );
print "result=$result\n";
outputs following:
User-Agent Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0
redirects to: https://www.google.com
result=HTTP::Response=HASH(0x3474ef0)
but does not create any cookie file even thou i can see a bunch of cookies in firebug.
after adding this code, the file is populating...
$mech->cookie_jar->set_cookie(
qw(
3
cat
buster
/
.example.com
0
0
0
)
);

Cannot login with UserAgent

I have managed to login with the code below. Now I can do it ony once a day.
And then I cant login, but get the login page in the response.
But when i print $reqstr from the code below and paste it to browser(like firefox), I can log in.
Wget doesnt work neiter. Only normal browser.
Soemtimes it seems , that Im logged in, but only get such content:
"<html>\cJ<head>\cJ\cI<meta http-equiv=\"content-type\" content=\"text/html; charset=ISO-8859-1\"><meta http-equiv=\"expires\" content=\"0\"><meta http-equiv=\"pragma\" content=\"no-cache\">\cJ\cI<meta http-equiv=\"refresh\" content=\"0; URL='https://www.address.com/'\">\cJ</head>\cJ</html>\cJ"
I also noticed, that while I cant login, Im getting this part in a debugger:
_uri_canonical' => URI::https=SCALAR(0x17dad28)
-> REUSED_ADDRESS
'handlers' => HASH(0x22dc0c0)
'response_data' => ARRAY(0x22ee8b8)
0 HASH(0x22d9a48)
'callback' => CODE(0x22dba30)
-> &LWP::UserAgent::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:682] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:679-682
1 HASH(0x22eea08)
'callback' => CODE(0x22d9cb8)
-> &LWP::Protocol::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:138] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:135-138
Any clue?
Here the code:
my $b = LWP::UserAgent->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/31.2.0',);
my $cookie_jar = HTTP::Cookies->new(
file => 'lwp_cookies.txt',
autosave => 1,
ignore_discard => 1,
);
$cookie_jar->clear;
$cookie_jar->clear_temporary_cookies;
$b->cookie_jar($cookie_jar);
my $url = "https://www.address.com";
my $r = $b->get($url);
$r->decoded_content =~ /FORM ACTION="(.*?)" METHOD/msgi;
my $a = "$url$1";
print $a."\n";
my $reqstr = $a."&LoginAction=Login&Number=55555&KPassword=passw&UserID=uid";
my $req = HTTP::Request->new(POST => $reqstr);
$req->header('Host', 'www.address.com');
$req->header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$req->header('Connection', 'keep-alive');
$req->header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8');
my $c = $b->request($req);
You need to re-request that page with the referrer added via referer() for LWP::UserAgent (or see my second answer if you aren't wedded to that module)
sub login { # Code not tested and not really compilable, just a stub for you
my (#other_args, $url, $referrer_url) = #_;
# Add your login code from the question, up to calling $b->request()
$req->referer($referrer_url) if $referrer_url;
my $c = $b->request($req);
return $c; # Or return the response?
}
my $result1 = login($original_login_url); #first try
# Obtain the redirect_url from the response.
# If it was a 301 redirect, you can do it via
# my #redirects = $response->redirects();
my $referrer_url = $original_login_url;
my $result2 = login($redirect_url, $referrer_url);
References:
http://forums.devshed.com/perl-programming-6/lwp-meta-refresh-tag-handling-63484.html
http://www.herongyang.com/Perl/LWP-UserAgent-Follow-HTTP-Redirects.html
If you aren't dead set on using LWP::UserAgent, use WWW::Mechanize instead.
Best approach: use WWW::Mechanize::Plugin::FollowMetaRedirect. The SYNOPSIS is pretty short and to the point:
use WWW::Mechanize;
use WWW::Mechanize::Plugin::FollowMetaRedirect;
my $mech = WWW::Mechanize->new;
$mech->get( $url );
$mech->follow_meta_redirect;
# Optionally, skip emulating the waiting time
$mech->follow_meta_redirect( ignore_wait => 1 );
If you don't have access to that module, you can create your own, similar to this: http://www.perlmonks.org/?node_id=487286
(Basically, parse the returned content using the regex shudder to extract the refresh URL, and get that URL. As per my other answer, you might need to add the referrer header)

Form a CURL request to PayMill in Perl

I'm out of my depth with curl.
I want to integrate PayMill into my site (which is written in Perl).
There isn't a Perl lib for Paymill yet, so I need to connect to them via curl.
I have completed the front end JS Paymill integration, and received a payment token from PayMill.
I now need to pass the token received from Paymill to my backend and use curl to ask PayMill to complete the transaction and charge the user.
At this point I'm stuck.
To make a transaction, the PayMill documentation says that I must do the following:
curl https://api.paymill.de/v2/transactions \
-u b94a7550bd908877cbae5d3cf0dc4b74: \
-d "amount=4200" \
-d "currency=EUR" \
-d "token=098f6bcd4621d373cade4e832627b4f6" \
-d "description=Test Transaction"
I believe -u is the Paymill secret key to authenticate my request although the documentation is not clear here.
I've had a look at WWW::Curl::Easy, Net:Curl::Easy and LWP::Curl, however nothing in the documentation for those methods makes it obvious to me how to form the query above.
I've tried (without really believing it would work), simply encoding a string in perl as described above;
my $request = '-u ' . $private_key . " ";
foreach my $key (keys %$params_in) {
$request .= '-d "' . lc($key) .'='.$params_in->{$key} . ' ';
}
And then passing $request to my attempt at curl as follows;
my $curl = WWW::Curl::Easy->new;
$curl->setopt(WWW::Curl::Easy::CURLOPT_HEADER(), 1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_URL(), $paymill_server);
$curl->setopt(WWW::Curl::Easy::CURLOPT_POST(), 1);
$curl->setopt(WWW::Curl::Easy::CURLOPT_POSTFIELDS(), $request);
my $response;
$curl->setopt(WWW::Curl::Easy::CURLOPT_WRITEDATA(), \$response);
my $retcode = $curl->perform;
however that fails with an Access Denied error, which I assume is because Paymill is not finding my key because I'm messing up the Curl (assuming -u is supposed to be the secret_key).
I feel I'm missing something obvious here.
Could someone point me in the right direction re how to do this?
Thanks
UPDATE
Excellent answers, thanks everyone for your help, it's working now. I went with Matthias's solution in the end and the final complete solution for making a transaction looked as follows;
use LWP::UserAgent;
use MIME::Base64;
use JSON::XS;
my $ua = LWP::UserAgent->new;
$ua->default_header(Authorization => "Basic " . encode_base64(private_key));
my $response = $ua->post(https://api.paymill.de:443/v2/transactions , $params );
if ( $response->is_success ) {
my $obj = eval { decode_json $response->content } || {};
etc
}
Like other answers propose the best way would be using LWP::UserAgent for doing the requests.
Edit: Since PAYMILL is sending challenge responses since a while now I updated the code.
Since Paymill doesn't comply with RFC 2616, Section 14.47 (the API isn't sending a challenge response) LWP::UserAgent and similar are failing in sending a second request with the credentials. The solution is to "force" LWP::UserAgent to send the credentials with the first request by adding them as header:
use LWP::UserAgent;
use MIME::Base64;
my $ua = LWP::UserAgent->new;
# Use the following line no longer:
# $ua->default_header(Authorization => "Basic " . encode_base64("your PRIVATE key"))
$ua->credentials('api.paymill.de:443', '', 'YOUR PRIVATE KEY');
# Dumping only
use Data::Dumper;
print Dumper($ua->get("https://api.paymill.de:443/v2/clients"));
Disclosure: I work at Paymill.
I don't know if the authentication part with the user/password and your token is correct as I don't know what the 'Realm' is supposed to be. Still, have a go with LWP. It's not that I don't like Curl, I just don't know it, but I do know LWP.
use strict; use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->credentials(
'api.paymill.de:80',
'Realm?',
'b94a7550bd908877cbae5d3cf0dc4b74'
);
my $response = $ua->post(
' https://api.paymill.de/v2/transactions',
{
amount => "4200",
currency => "EUR",
token => "098f6bcd4621d373cade4e832627b4f6",
description => "Test Transaction",
}
);
if ( $response->is_success ) {
print $response->decoded_content; # or whatever
} else {
die $response->status_line;
}
Edit: I read a little in the Paymill documentation. It says:
Authentication
Example
% curl https://api.paymill.de/v2/clients \
-u e73fa5e7b87620585b5ea5d73c4d23bb:
To authenticate at the Paymill API, you need the private key of your
test or live account. You have to use http basic access
authentification. Your key has to be set as the username. A password
isn’t required and you don’t have to insert one. But if you want, feel
free to insert an arbitrary string.
Note
Please keep your private keys secure and don’t pass them to anybody. These private keys have extreme secure information for
handling the transactions of your shop.
All your requests must be made via https. Requests which will be made in another way will fail. This is for security reasons of the
submitted data.
There is also a link to http://en.wikipedia.org/wiki/HTTP_Secure, which clears up the -u part pretty much I believe.
You can use LWP::Protocol::Net::Curl to integrate LWP and libcurl organically. Check this:
#!/usr/bin/env perl
use common::sense;
use Data::Printer;
use JSON::XS;
use LWP::Protocol::Net::Curl verbose => 1;
use LWP::UserAgent;
# create user agent
my $ua = LWP::UserAgent->new;
# POST request
my $res = $ua->post(
'https://b94a7550bd908877cbae5d3cf0dc4b74:#api.paymill.de/v2/transactions',
'Accept-Encoding' => 'gzip',
Content => {
amount => 4200,
currency => 'EUR',
token => '098f6bcd4621d373cade4e832627b4f6',
description => 'Test Transaction',
},
);
# parse received data
my $obj = eval { decode_json $res->content } // {};
# output
p $obj;
The output:
* About to connect() to api.paymill.de port 443 (#0)
* Trying 62.138.241.3...
* Connected to api.paymill.de (62.138.241.3) port 443 (#0)
* Connected to api.paymill.de (62.138.241.3) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: none
* SSL connection using RC4-SHA
* Server certificate:
* subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.paymill.de
* start date: 2012-07
* expire date: 2013-10
* subjectAltName: api.paymill.de matched
* issuer: C=GB; S
* SSL certificate verify ok.
* Server auth using Basic with user 'b94a7550bd908877cbae5d3cf0dc4b74'
> POST /v2/transactions HTTP/1.1
Authorization: Basic Yjk0YTc1NTBiZDkwODg3N2NiYWU1ZDNjZjBkYzRiNzQ6
User-Agent: libwww-perl/6.04 libcurl/7.28.0 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 libssh2/1.2.8
Host: api.paymill.de
Accept: */*
Accept-Encoding: gzip
Content-Length: 92
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 92 out of 92 bytes
< HTTP/1.1 200 OK
< Server: nginx
< Date: Wed, 09 Jan 2013 17:22:54 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Connection: close
< Set-Cookie: PHPSESSID=rmdo5a8c6u107gma28lotmmn24; path=/
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< X-Server: hrtt-frn5-de13
<
* Closing connection #0
Printing in line 28 of paymill.pl:
\ {
data {
amount 4200,
client {
created_at 1357752174,
description undef,
email undef,
id "client_85cb0bfc837f31c81015",
payment [],
subscription undef,
updated_at 1357752174
},
created_at 1357752174,
currency "EUR",
description "Test Transaction",
id "tran_c672daa0538e2a04e919",
livemode false,
origin_amount 4200,
payment {
card_holder undef,
card_type "visa",
client "client_85cb0bfc837f31c81015",
country undef,
created_at 1357752174,
expire_month 12,
expire_year 2014,
id "pay_2732689f44928301c769",
last4 1111,
type "creditcard",
updated_at 1357752174
},
preauthorization undef,
refunds undef,
status "closed",
updated_at 1357752174
},
mode "test"
}

How do I change the order of HTTP request headers sent by Perl's LWP?

For a test i need to do a get requets to a website - unfortunatly when using perl lwp the "connection" appears in the header b4 the host. As a result the request gets filtered by the web application firewall. All i need is to remove or move down the connection line in the header. When i do the requets with my script:
use warnings;
use IO::Socket;
use LWP::UserAgent;
use LWP::Protocol::http;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
push(#LWP::Protocol::http::EXTRA_SOCK_OPTS, SendTE => 0, PeerHTTPVersion => "1.1");
$ua->default_header(Cookie => 'XXX', User-Agent => 'whateva');
my $request = $ua->get('https://www.test.com/test.html?...');
....
The header looks like this:
GET /test.html?... HTTP/1.1
Connection: close
Host: www.test.com
User-Agent: whateva
Cookie: XXXX
BUT it should look like this to work (conenction comes after host):
GET /test.html?... HTTP/1.1
Host: www.test.com
Connection: close
User-Agent: whateva
Cookie: XXXX
How do i get rid of that connection line in LWP? I just need to re-oder it....Its not that it needs to be completly removed; I am happy to add it later in there again as
# $userAgent->default_header ("Connection" => "keep-alive");..
Thx a lot in advance!
To work around the bug in your firewall*, change
return _bytes(join($CRLF, "$method $uri HTTP/$ver", #h2, #h, "", $content));
in Net/HTTP.pm to
my #h3 = ( #h2, #h );
if (my ($idx) = grep /^Host:/, 0..$#h3) {
unshift(#h3, splice(#h3, $idx, 1));
}
return _bytes(join($CRLF, "$method $uri HTTP/$ver", #h3, "", $content));
* — According to the HTTP/1.1 spec, RFC 2616, "The order in which header fields with differing field names are received is not significant."