The following code ...
my $user_agent = LWP::UserAgent->new;
my $request = HTTP::Request->new(GET => $url);
my $response = $user_agent->request($request);
if ($response->is_success) {
print "OK\n";
} else {
die($response->status_line);
}
.. will fail with ..
500 Can't connect to <hostname> (Bad hostname '<hostname>')
.. if the hostname in $url is an IPv6 only address (that is: presence of an AAAA record, but no A record).
My questions are:
How do I enable IPv6 support in LWP?
How do I configure LWP's settings for "prefer-IPv4-over-IPv6" (A vs. AAAA) / "prefer-IPv6-over-IPv4" (AAAA vs. A)?
It looks like you just need to use Net::INET6Glue::INET_is_INET6. To quote its example:
use Net::INET6Glue::INET_is_INET6;
use LWP::Simple;
print get( 'http://[::1]:80' );
print get( 'http://ipv6.google.com' );
I believe you'll have to change the module to use the IPV6 net module. By default it does not have this enabled: http://eintr.blogspot.com/2009/03/bad-state-of-ipv6-in-perl.html. I don't believe there is something as simple as "prefer-ipv6"
Debian Wheezy (perl 5.14)
Work nice:
use LWP::Simple;
print get( 'http://ip6-localhost:80' );
Not working (1)
use LWP::Simple;
print get( 'http://[::1]:80' );
Not working (2) [Return: Bad hostname]
use LWP::Simple;
$ua = new LWP::UserAgent();
my $req = new HTTP::Request("GET", "http://[::1]/");
my $res = $ua->request($req);
Not working (3) [Return: Connection refused]
use Net::INET6Glue::INET_is_INET6;
use LWP::Simple;
$ua = new LWP::UserAgent();
my $req = new HTTP::Request("GET", "http://[::1]/");
my $res = $ua->request($req);
Soo, if you don't need IPv6 address in http request, it's fine. :(
Related
I am trying to send multiple HTTP get requests using Perl. I need to send those request using sock proxy.
If I use following code, I am able to send a request with sock proxy
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(
agent => q{Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.2.0; .NET CLR 1.1.4322)},
);
$ua->proxy([qw/ http https /] => 'socks://localhost:9050'); # Tor proxy
#$ua->cookie_jar({});
$a = 10;
while ( $a < 20 ) {
my $rsp = $ua->get('http://example.com/type?parameter=1¶meter=2');
print $rsp->content;
$a = $a + 1;
}
It works successfully but I need to use AnyEvent to send multiple GET requests in parallel
#!/usr/bin/perl
use strict;
use AnyEvent;
use AnyEvent::HTTP;
use Time::HiRes qw(time);
use LWP::Protocol::socks;
use AnyEvent::Socket;
my $cv = AnyEvent->condvar( cb => sub {
warn "done";
});
my $urls = [
"http://url-1-withallparameters",
"http://url-2-withallparameters",
"http://url-3-withallparameters",
];
my $start = time;
my $result;
$cv->begin(sub { shift->send($result) });
for my $url ( #$urls ) {
$cv->begin;
my $now = time;
my $request;
$request = http_request(
GET => $url,
timeout => 2, # seconds
sub {
my ($body, $hdr) = #_;
if ($hdr->{Status} =~ /^2/) {
push (#$result,
join("\t",
($url,
" has length ",
$hdr->{'content-length'},
" and loaded in ",
time - $now,
"ms"))
);
}
else {
push (#$result,
join("",
"Error for ",
$url,
": (",
$hdr->{Status},
") ",
$hdr->{Reason})
);
}
undef $request;
$cv->end;
}
);
}
$cv->end;
warn "End of loop\n";
my $foo = $cv->recv;
print join("\n", #$foo), "\n" if defined $foo;
print "Total elapsed time: ", time-$start, "ms\n";
This is working fine but I am not able to send these requests using sock proxy.
Even in the terminal, if I export proxy commands like curl and wget they work fine with sock proxy, but when I use a Perl command to execute this script as a sock proxy it does not work.
I could integrate it with LWP::UserAgent but it is not working with AnyEvent.
I have used
proxy => [ $host, $port ],
below
GET => $url,
but it works for HTTP and HTTPS proxy only, not for sock proxy.
This url is working for HTTP/HTTPS proxy but not for sock proxy.
The
documentation for AnyEvent::HTTP has this
Socks proxies are not directly supported by AnyEvent::HTTP
If you read that section it may describe a workaround that suits you
Alternatively, take a look at
AnyEvent::HTTP::Socks which says
This module adds new ‘socks’ option to all http_* functions exported by AnyEvent::HTTP. So you can specify socks proxy for HTTP requests.
Or AnyEvent::HTTP::LWP::UserAgent which should allow you to use ideas from the working LWP::UserAgent code that you already have
I am using below code to post JSON data using LWP::useragent. I want to keep my session open and post two requests but it seems that its not working on linux machine (two POST requests are sent in two sessions instead of one).
Any suggestions? thanks in advance
#!/usr/bin/perl
use warnings;
use LWP::UserAgent;
use HTTP::Request::Common;
open (JSON, "json3.txt") or die "$!";
$raw_string1 = do{ local $/ = undef; <JSON>;
};
my $req = HTTP::Request->new(POST => 'http://www.example.com');
$hdr1 = 'User-Agent';
$val1 = 'Java/1.7.0_45';
$hdr2 = 'Connection';
$val2 = 'keep-alive';
$hdr3 = 'Accept';
$val3 = 'application/json, application/*+json';
$hdr4 = 'Host';
$val4 = 'example.com';
$hdr5 = 'Content-Type';
$val5 = 'application/json;charset=UTF-8';
$req -> header($hdr3 => $val3);
$req -> header($hdr5 => $val5);
$req -> header($hdr1 => $val1);
$req -> header($hdr4 => $val4);
$req -> header($hdr2 => $val2);
$req->content_type("application/json");
$req->content("$raw_string1");
my $ua = LWP::UserAgent->new(keep_alive => 1);
$res = $ua->request($req);
print $res->content;
$res = $ua->request($req);
print $res->content;
Keep-Alive is just recommending the server to not close the TCP connection after the request because their will be more requests. The server does not need to follow the recommendation and in fact lots of servers don't to keep the number of open TCP connections low which all use up resources on the system.
Apart from that you should not need to set the Connection and Host header explicitly.
I've tried the following simplified example and a packet capture shows that keep alive is working if the server supports it (LWP 6.05). Supporting means that the server keeps the connection open and does not set a "Connection: close" header and either uses HTTP/1.1 or uses HTTP/1.0 together with "Connection: keep-alive" header.
my $req = HTTP::Request->new(POST => 'http://www.example.com/');
$req->content_type("application/json");
$req->content("foo");
my $ua = LWP::UserAgent->new(keep_alive => 1);
$res = $ua->request($req);
print $res->content;
$res = $ua->request($req);
print $res->content;
its resolved...it wasn't due to backend server closing the connection. i think i was using old perl (5.10) with old fedora version. I spun a new instance of CentOs and its working on it. thanks
I want to print the redirected url in perl.
Input url : http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv
output url : http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->request;
How to get this done in perl?
You need to examine the HTTP response to find the URL. The documentation of HTTP::Response gives full details of how to do this, but to summarise, you should do the following:
use strict;
use warnings;
use feature ':5.10'; # enables "say"
use LWP::UserAgent;
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
# you should add a check to ensure the response was actually successful:
if (! $res->is_success) {
say "GET failed! " . $res->status_line;
}
# show the base URI for the response:
say "Base URI: " . $res->base;
You can view redirects using HTTP::Response's redirects method:
if ($res->redirects) { # are there any redirects?
my #redirects = $res->redirects;
say join(", ", #redirects);
}
else {
say "No redirects.";
}
In this case, the base URI is the same as $url, and if you examine the contents of the page, you can see why.
# print out the contents of the response:
say $res->decoded_contents;
Right near the bottom of the page, there is the following code:
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
The redirect is handled by javascript, and so is not picked up by LWP::UserAgent. If you want to get this URL, you will need to extract it from the response contents (or use a different client that supports javascript).
On a different note, your script starts off like this:
use LWP::UserAgent qw();
The code following the module name, qw(), is used to import particular subroutines into your script so that you can use them by name (instead of having to refer to the module name and the subroutine name). If the qw() is empty, it's not doing anything, so you can just omit it.
To have LWP::UserAgent follow redirects, just set the max_redirects option:
use strict;
use warnings;
use LWP::UserAgent qw();
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new( max_redirect => 5 );
my $res = $ua->get($url);
if ( $res->is_success ) {
print $res->decoded_content; # or whatever
} else {
die $res->status_line;
}
However, that website is using a JavaScript redirect.
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
This will not work unless you use a framework that enables JavaScript, like WWW::Mechanize::Firefox.
It will throw you an error for the last line $res - > request since it is returning hash and content from the response. So below is the code:
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->content;
I'm using LWP::UserAgent to request a lot of page content. I already know the ip of the urls I am requesting so I'd like to be able to specify the ip address of where the url I am requesting is hosted, so that LWP does not have to spend time doing a dns lookup. I've looked through the documentation but haven't found any solutions. Does anyone know of a way to do this? Thanks!
So I found a module that does exactly what I'm looking for: LWP::UserAgent::DNS::Hosts
Here is an example script that I tested and does what I specified in my question:
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use LWP::UserAgent::DNS::Hosts;
LWP::UserAgent::DNS::Hosts->register_host(
'www.cpan.org' => '199.15.176.140',
);
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
#actually enforces new DNS settings as if they were in /etc/hosts
LWP::UserAgent::DNS::Hosts->enable_override;
my $response = $ua->get('http://www.cpan.org/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}
Hum, your system should already be caching DNS responses. Are you sure this optimisation would help?
Option 1.
Use
http://192.0.43.10/
instead of
http://www.example.org/
Of course, that will fail if the server does name-based virtual hosting.
Option 2.
Replace Socket::inet_aton (called from IO::Socket::INET called from LWP::Protocol::http) with a caching version.
use Socket qw( );
BEGIN {
my $original = \&Socket::inet_aton;
my %cache;
my $caching = sub {
return $cache{$_[0]} //= $original->($_[0]);
};
no warnings 'redefine';
*Socket::inet_aton = $caching;
}
Simply replace the domain name with the IP address in your URL:
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
# my $response = $ua->get('http://stackoverflow.com/');
my $response = $ua->get('http://64.34.119.12/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}
Debian Wheezy (perl 5.14)
Work nice:
use LWP::Simple;
print get( 'http://ip6-localhost:80' );
Not working (1)
use LWP::Simple;
print get( 'http://[::1]:80' );
Not working (2) [Return: Bad hostname]
use LWP::Simple;
$ua = new LWP::UserAgent();
my $req = new HTTP::Request("GET", "http://[::1]/");
my $res = $ua->request($req);
Not working (3) [Return: Connection refused]
use Net::INET6Glue::INET_is_INET6;
use LWP::Simple;
$ua = new LWP::UserAgent();
my $req = new HTTP::Request("GET", "http://[::1]/");
my $res = $ua->request($req);
Why I need it? Because ldirectord need it. :(
Any suggestion?
Another post suggested using INET6Glue
use Net::INET6Glue::INET_is_INET6;
use LWP::Simple;
print get( 'http://[::1]:80' );
print get( 'http://ipv6.google.com' );