HTTP request not going through proxy

HTTP request not going through proxy - perl

I have written this code to fire a http request through a proxy.
But the request does not seem to use proxy. Even though I give a wrong proxy, it is returning OK.
Is there any way I can check, whether the HTTP request went via proxy?
What is the issue in this code which makes it not use proxy?
sub fire_http_request_through_proxy()
{
my $proxy = $_;
my $ua = LWP::UserAgent->new;
$ENV{HTTP_PROXY} = $proxy;
$ua->env_proxy; # initialize from environment variables
$ua->timeout(20);
my $response = $ua->get('http://www.google.com');
delete $ENV{HTTP_PROXY};
if ($response->is_success)
{
print $response->decoded_content . "\n";
}
else
{
die $response->status_line;
}
}

Sebastian and oalders have already solved your problem, but I'd just like to note that you don't need to mess around with $ENV{HTTP_PROXY} anyway — you can just use $ua->proxy(), like this:
$ua->proxy( http => 'http://1.1.1.1' );
or even:
$ua->proxy( ['http', 'https', 'ftp'] => 'http://1.1.1.1' );
Ps. If you really want to check which proxy was used by LWP for a particular request, you can peek at $response->request->{proxy}, which should be a URI object. However, as far as I know, this property is undocumented (I found out about it by reading the source) and thus subject to change in later versions of LWP. Use at your own risk!

Are you sure that $_ has a true value? This dies appropriately for me:
#!/usr/bin/env perl
use strict;
use warnings;
use LWP::UserAgent;
fire_http_request_through_proxy();
sub fire_http_request_through_proxy {
my $ua = LWP::UserAgent->new;
local $ENV{HTTP_PROXY} = 'http://1.1.1.1';
$ua->env_proxy; # initialize from environment variables
$ua->timeout( 20 );
my $response = $ua->get( 'http://www.google.com' );
delete $ENV{HTTP_PROXY};
if ( $response->is_success ) {
print $response->decoded_content . "\n";
}
else {
die $response->status_line;
}
}
So, maybe $_ isn't what you think it is. If it's not defined, then no proxy will be used. Having said that, $_ is probably not the variable you want to use here. You could either declare a variable for use in this case, pass a variable right to the subroutine or actually set an ENV variable outside of the script.
One other point. Rather than setting and deleting the ENV var in your script, just declare the change with local and it will only take effect inside this block of code. That way you don't have to clean up after yourself and you don't risk overriding vars which may have been set elsewhere.

Take a look at your code sub fire_http_request_through_proxy(), especially the last two characters... This is a prototype. Basically you are saying "I don't take any arguments during compile-time".
I guess you are simply invoking the method before its declaration -> Always use warnings:
main::fire_http_request_through_proxy() called too early to check
prototype at test.pl line ...
So either change it to fire_http_request_through_proxy or change it to fire_http_request_through_proxy($) and invoke it after its declaration.
More about prototyping in perlsub.

Be sure
to read parameter as $_[0] or pop, not $_
to not include () in sub definition
Script:
sub fire_http_request_through_proxy {
my $proxy = $_[0];
my $timeout = 20;
my $url = 'http://www.google.com';
my $ua = LWP::UserAgent->new;
$ua->proxy(['http', 'https', 'ftp'] => $proxy);
$ua->timeout($timeout);
my $response = $ua->get($url);
if ($response->is_success) {
print $response->decoded_content . "\n";
}
else {
die $response->status_line;
}
}
Test:
To make it work, parameter of proxy has to be in correct format (http://host:port)
fire_http_request_through_proxy('http://176.34.248.142:9050');

Related

Reuse LWP Useragent object with HTTP POST query in a for/while loop

I am using LWP Useragent to make multiple POST calls with basic Authorization, wherein POST URL parameters are read from a CSV file. Here is my code:
use strict;
use warnings;
use LWP::UserAgent;
use JSON 'from_json';
use MIME::Base64 'encode_base64';
use Data::Dumper;
my #assets;
my %data;
my $response;
my $csvfile = 'ScrappedData_Coins.csv';
my $dir = "CurrencyImages";
open (my $csv, '<', "$dir/$csvfile") || die "cant open";
foreach (<$csv>) {
chomp;
my #currencyfields = split(/\,/);
push(#assets, \#currencyfields);
}
close $csv;
my $url = 'https://example.org/objects?';
my %options = (
"username" => 'API KEY',
"password" => '' ); # Password field is left blank
my $ua = LWP::UserAgent->new(keep_alive=>1);
$ua->agent("MyApp/0.1");
$ua->default_header(
Authorization => 'Basic '. encode_base64( $options{username} . ':' . $options{password} )
);
my $count =0;
foreach my $row (#cryptoassets) {
$response = $ua->post(
$url,
Content_Type => 'multipart/form-data',
Content => {
'name'=>${$row}[1],
'lang' => 'en',
'description' => ${$row}[6],
'parents[0][Objects][id]' => 42100,
'Objects[imageFiles][0]' =>[${$row}[4]],
}
);
if ( $response->is_success ) {
my $json = eval { from_json( $response->decoded_content ) };
print Dumper $json;
}
else {
$response->status_line;
print $response;
}
}
sleep(2);
}
Basically, I want to reuse the LWP object. For this, I am creating the LWP object, its headers, and response objects once with the option of keep_alive true, so that connection is kept open between server and client. However, the response from the server is not what I want to achieve. One parameter value ('parents[0][Objects][id]' => 42100) seems to not get passed to the server in HTTP POST calls. In fact, its behavior is random, sometimes the parentID object value is passed, and sometimes not, while all other param values are passing correctly. Is this a problem due to the reusing of the LWP agent object or is there some other problem? Because when I make a single HTTP POST call, all the param values are passed correctly, which is not the case when doing it in a loop. I want to make 50+ POST calls.

Reusing the user-agent object would not be my first suspicion.
Mojo::UserAgent returns a complete transaction object when you make a request. It's easy for me to inspect the request even after I've sent it. It's one of the huge benefits that always annoyed my about LWP. You can do it, but you have to break down the work to form the request first.
In this case, create the query hash first, then look at it before you send it off. Does it have everything that you expect?
Then, look at the request. Does the request match the hash you just gave it?
Also, when does it go wrong? Is the first request okay but the second fails, or several are okay then one fails?
Instead of testing against your live system, you might try httpbin.org. You can send it requests in various ways
use Mojo::UserAgent;
use Mojo::Util qw(dumper);
my $hash = { ... };
say dumper( $hash );
my $ua = Mojo::UserAgent->new;
$ua->on( prepare => sub { ... } ); # add default headers, etc
my $tx = $ua->post( $url, form => $hash );
say "Request: " . $tx->req->to_string;

I found the solution myself. I was passing form parameter data (key/value pairs) using hashref to POST method. I changed it to arrayref and the problem was solved. I read how to pass data to POST method on CPAN page. Thus, reusing LWP object is not an issue as pointed out by #brian d foy.
CPAN HTTP LWP::UserAgent API
CPAN HTTP Request Common API

Perl: An asynchronous http proxy via mojolicious

I made a simple http proxy, it's work fine, but not fast, because in the function handle_request, I use
my $tx = $ua->start( Mojo::Transaction::HTTP->new(req=>$request) );
to do the request, it's blocking.
I try to use a callback like:
$ua->start( Mojo::Transaction::HTTP->new(req=>$request) )=>sub{ ... }
to make it's non-blocking, and then, got a mistake:
'error' => { 'message' => 'Premature connection close'}
I guess that's because function handle_request return immediately, it does not wait the callback to be finished. If I use semaphore to wait the callback, that's mean it's blocking again.
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use Mojo::IOLoop::Server;
use Mojo::UserAgent;
use Mojo::Message::Response;
use Mojo::Message::Request;
use Mojo::Transaction::HTTP;
use Data::Dumper;
binmode STDOUT, ":encoding(UTF-8)";
my %buffer;
Mojo::IOLoop->server( {port => 3128} => sub {
my ($loop, $stream, $client) = #_;
$stream->on(
read => sub {
my ($stream, $chunk) = #_;
my $buffer = $buffer{$client}{read_buffer} .= $chunk;
if ($buffer =~ /^GET\s+|POST\s+|HEAD\s+(.*)\r\n\r\n$/i) {
$buffer{$client}{read_buffer} = '';
&handle_request($client,$stream,$buffer);
}
elsif ($buffer =~ /^CONNECT\s+(.*)\r\n\r\n$/i) {
$buffer{$client}{read_buffer} = '';
&handle_connect($stream,$buffer);
}
elsif($buffer{$client}{connection})
{
$buffer{$client}{read_buffer} = '';
Mojo::IOLoop->stream($buffer{$client}{connection})->write($chunk);
}
if(length($buffer)>= 20 *1024 * 1024) {
delete $buffer{$client};
Mojo::IOLoop->remove($client);
return;
}
});
});
sub handle_request{
my($client,$stream,$chunk) = #_;
my $request = Mojo::Message::Request->new;
$request = $request->parse($chunk);
my $ua = Mojo::UserAgent->new;
my $tx = $ua->start( Mojo::Transaction::HTTP->new(req=>$request) );
$stream->write( $tx->res->to_string );
}
sub handle_connect{
my ($stream, $chunk) = #_;
my $request = Mojo::Message::Request->new;
my $ua = Mojo::UserAgent->new;
$request = $request->parse($chunk);
print Dumper($request);
}
Mojo::IOLoop->start;
Hope to get some suggestions .

You have 2 problem:
You try to call nonblocking variant of $ua->start when your code have blocking style. Function handle_request must have callback as parameter.
If you have chain of callback then the best way to implement it is to use Mojo::IOLoop::Delay.
When you create variable $ua in non-blocking style in sub handle_request then your variable is destoyed by garbage collector because first execute exit of sub handle_request and $ua destroyed, because it is local variable and then get answer from $ua. So you get Premature connection close. You need to save instance of $ua elsewhere to prevent such error.
Upd.
I write bad variant of http/https proxy which work only via CONNECT method and have bug with not full first http message.
Upd.
I add another example of http/https proxy which correctly read first http message and work not only via CONNECT method.
Upd.
Oh, author of the Mojo wrote example of https proxy

LWP::UserAgent set ip of requested url so LWP doesn't have to do dns lookup

I'm using LWP::UserAgent to request a lot of page content. I already know the ip of the urls I am requesting so I'd like to be able to specify the ip address of where the url I am requesting is hosted, so that LWP does not have to spend time doing a dns lookup. I've looked through the documentation but haven't found any solutions. Does anyone know of a way to do this? Thanks!

So I found a module that does exactly what I'm looking for: LWP::UserAgent::DNS::Hosts
Here is an example script that I tested and does what I specified in my question:
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use LWP::UserAgent::DNS::Hosts;
LWP::UserAgent::DNS::Hosts->register_host(
'www.cpan.org' => '199.15.176.140',
);
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
#actually enforces new DNS settings as if they were in /etc/hosts
LWP::UserAgent::DNS::Hosts->enable_override;
my $response = $ua->get('http://www.cpan.org/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}

Hum, your system should already be caching DNS responses. Are you sure this optimisation would help?
Option 1.
Use
http://192.0.43.10/
instead of
http://www.example.org/
Of course, that will fail if the server does name-based virtual hosting.
Option 2.
Replace Socket::inet_aton (called from IO::Socket::INET called from LWP::Protocol::http) with a caching version.
use Socket qw( );
BEGIN {
my $original = \&Socket::inet_aton;
my %cache;
my $caching = sub {
return $cache{$_[0]} //= $original->($_[0]);
};
no warnings 'redefine';
*Socket::inet_aton = $caching;
}

Simply replace the domain name with the IP address in your URL:
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
# my $response = $ua->get('http://stackoverflow.com/');
my $response = $ua->get('http://64.34.119.12/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}

Why does CGI::Session new and load fail ( couldn't thaw() )?

I tried using the CGI::Session Library but for some reason my code won't keep a persistant session ... this is using Perl Moose for OOP, and is using Moose builders to instantiate the _cgi and _sss (session) parameters of a My::Session object...
UPDATED CODE
My::Role::PersistantData
package My::Role::PersistsData;
use Moose::Role;
use namespace::autoclean;
has '_cgi' => (
is => 'rw',
isa => 'Maybe[CGI]',
builder => '_build_cgi'
);
has '_sss' => (
is => 'rw',
isa => 'Maybe[CGI::Session]',
builder => '_build_sss'
);
My::Session
package My::Session;
use Moose;
use namespace::autoclean;
with 'My::Role::PersistsData';
use CGI;
use CGI::Session ('-ip_match');
use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
sub start{
my($self) = #_;
my $cgi = $self->cgi();
$self->log("Session Started!");
}
sub cgi{
my($self) = #_;
$self->_cgi = $self->_build_cgi() unless $self->_cgi;
return ($self->_cgi);
}
sub _build_cgi{
my($self) = #_;
my $cgi = CGI->new();
if(!$cgi){
#print "mising cgi";
}
return ( $cgi );
}
sub _build_sss{
my($self) = #_;
my $cgi = $self->cgi();
my $sid = $cgi->cookie("CGISESSID") || $cgi->param('CGISESSID') || undef;
$self->log("Session ID Initial is: ".($sid?$sid:"undef"));
my $sss = CGI::Session->new(undef, $cgi, {Directory=>'tmp'}) or die CGI::Session->errstr;
my $cookie = $cgi->cookie(CGISESSID => $sss->id() );
$self->log("Resulting Session ID is: ".$sid." cookie is: ".$cookie);
print $cgi->header( -cookie=>$cookie );
return ( $sss );
}
main.pl
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use My::Session;
$| = 1;
$, = " ";
$\ = "\n <br />";
my $sss = My::Session->new();
$sss->start();
print Dumper($sss);
It's pretty weird because the first time I run this I get an actual CGISESSION ID and I am able to carry it over on a page refresh...
however if I load the page again, suddenly the $sss (session) comes back as undefined, when it should return a new Session object:
$sss = new CGI::Session("driver:File", $sid, {Directory=>'/tmp'})
for some reason $sss is coming back as undefined, which means it didnt initiate a new Session. A few tweaks to my code revealed this error:
new(): failed: load(): couldn't thaw() data using CGI::Session::Serialize::default:thaw(): couldn't thaw. syntax error at (eval 253) line 2, near "/>"
I also snooped around in CGI::Session.pm and found where this error was being thrown at, I guess it's not able to parse _DATA or even read it...because of some strange characters... "/>"
CGI::Session.pm
....
$self->{_DATA} = $self->{_OBJECTS}->{serializer}->thaw($raw_data);
unless ( defined $self->{_DATA} ) {
#die $raw_data . "\n";
return $self->set_error( "load(): couldn't thaw() data using $self->{_OBJECTS}->{serializer} :" .
$self->{_OBJECTS}->{serializer}->errstr );
}
Any idea why this isnt working?

Most likely this is due to a different session cookie being sent (been there, hit that wall with head. HARD).
Please print the session cookie value being used to store the session initially as well as session cookie value supplied by subsequent request.
If they are indeed different, you have 2 options:
Investigate why different session cookie is sent by the browser in subsequent requests and fix that issue somehow.
I was never able to find conclusive answer but my app consisted of a frame with internal <iframe> so I suspect it was due to that.
If like me you can't find the root cause, you can also work around this.
My workaround: explicitly STORING the original session cookie value as a form variable being passed around 100% of your code pieces.
Then re-initialize session object with correct cookie value before your server side code requests session data.
Not very secure, annoying, hard to get right. But works. I wouldn't recommend it except as a uber-last-resort hack

Perhaps you could try (or at least look at the code to see how it works) for some stateful webapp module. I have used Continuity, very cool stuff.

For some reason you can't use Data::Dumper or other HTML tags with CGI::Session
Answer found here and here
Removing Dumper and HTML output fixed this problem -- kind of --
updated
Apparently you have to use escapes
$cgi->escapeHTML ( Dumper($session) );
and that FINALLY resolves this problem.
Perl is a pain!

What is the easiest way in pure Perl to stream from another HTTP resource?

What is the easiest way (without opening a shell to curl and reading from stdin) in Perl to stream from another HTTP resource? I'm assuming here that the HTTP resource I'm reading from is a potentially infinite stream (or just really, really long)

Good old LWP allows you to process the result as a stream.
E.g., here's a callback to yourFunc, reading/passing byte_count bytes to each call to yourFunc (you can drop that param if you don't care how large the data is to each call, and just want to process the stream as fast as possible):
use LWP;
...
$browser = LWP::UserAgent->new();
$response = $browser->get($url,
':content_cb' => \&yourFunc,
':read_size_hint' => byte_count,);
...
sub yourFunc {
my($data, $response) = #_;
# do your magic with $data
# $respose will be a response object created once/if get() returns
}

HTTP::Lite's request method allows you to specify a callback.
The $data_callback parameter, if used, is a way to filter the data as it is received or to handle large transfers. It must be a function reference, and will be passed: a reference to the instance of the http request making the callback, a reference to the current block of data about to be added to the body, and the $cbargs parameter (which may be anything). It must return either a reference to the data to add to the body of the document, or undef.
However, looking at the source, there seems to be a bug in sub request in that it seems to ignore the passed callback. It seems safer to use set_callback:
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Lite;
my $http = HTTP::Lite->new;
$http->set_callback(\&process_http_stream);
$http->http11_mode(1);
$http->request('http://www.example.com/');
sub process_http_stream {
my ($self, $phase, $dataref, $cbargs) = #_;
warn $phase, "\n";
return;
}
Output:
C:\Temp> ht
connect
content-length
done-headers
content
content-done
data
done
It looks like a callback passed to the request method is treated differently:
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Lite;
my $http = HTTP::Lite->new;
$http->http11_mode(1);
my $count = 0;
$http->request('http://www.example.com/',
\&process_http_stream,
\$count,
);
sub process_http_stream {
my ($self, $data, $times) = #_;
++$$times;
print "$$times====\n$$data\n===\n";
}

Wait, I don't understand. Why are you ruling out a separate process? This:
open my $stream, "-|", "curl $url" or die;
while(<$stream>) { ... }
sure looks like the "easiest way" to me. It's certainly easier than the other suggestions here...

Event::Lib will give you an easy interface to the fastest asynchronous IO method for your platform.
IO::Lambda is also quite nice for creating fast, responsive, IO applications.

Here is a version I ended up using via Net::HTTP
This is basically a copy of the example from the Net::HTTP man page / perl doc
use Net::HTTP;
my $s = Net::HTTP->new(Host => "www.example.com") || die $#;
$s->write_request(GET => "/somestreamingdatasource.mp3");
my ($code, $mess, %h) = $s->read_response_headers;
while (1) {
my $buf;
my $n = $s->read_entity_body($buf, 4096);
die "read failed: $!" unless defined $n;
last unless $n;
print STDERR "got $n bytes\n";
print STDOUT $buf;
}