Perl - HTTP::Proxy capture XHR/JSON communication - perl

The site http://openbook.etoro.com/#/main/ has an live feed what is generated by javascript via XHR keep-alive requests and getting answers from server as gzip compressed JSON string.
I want capture the feed into a file.
The usual way (WWW::Mech..) is (probably) not viable because the need of reverese engineering all Javascripts in the page and simulating the browser is really hard task, so, looking for an alternative solution.
My idea is using a Man-in-the-middle tactics, so the broswser will do his work and i want capture the communication via an perl proxy - dedicated only for this task.
I'm able catch the initial communication, but not the feed itself. The proxy working OK, because in the browser the feed is running only my filers not works.
use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;
use HTTP::Proxy::BodyFilter::simple;
use Data::Dumper;
use strict;
use warnings;
my $proxy = HTTP::Proxy->new(
port => 3128, max_clients => 100, max_keep_alive_requests => 100
);
my $hfilter = HTTP::Proxy::HeaderFilter::simple->new(
sub {
my ( $self, $headers, $message ) = #_;
print STDERR "headers", Dumper($headers);
}
);
my $bfilter = HTTP::Proxy::BodyFilter::simple->new(
filter => sub {
my ( $self, $dataref, $message, $protocol, $buffer ) = #_;
print STDERR "dataref", Dumper($dataref);
}
);
$proxy->push_filter( response => $hfilter); #header dumper
$proxy->push_filter( response => $bfilter); #body dumper
$proxy->start;
Firefox is configured using the above proxy for all communication.
The feed is running in the browser, so the proxy feeding it with data. (When i stop the proxy, the feed is stopping too). Randomly (can't figure when) i getting the following error:
[Tue Jul 10 17:13:58 2012] (42289) ERROR: Getting request failed: Client closed
Can anybody show me a way, how to construt the correct HTTP::Proxy filter for Dumper all communication between the browser and the server regardles of keep_alive XHR?

Here's something that I think does what you're after:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use HTTP::Proxy;
use HTTP::Proxy::BodyFilter::complete;
use HTTP::Proxy::BodyFilter::simple;
use JSON::XS qw( decode_json );
use Data::Dumper qw( Dumper );
my $proxy = HTTP::Proxy->new(
port => 3128,
max_clients => 100,
max_keep_alive_requests => 100,
);
my $filter = HTTP::Proxy::BodyFilter::simple->new(
sub {
my ( $self, $dataref, $message, $protocol, $buffer ) = #_;
return unless $$dataref;
my $content_type = $message->headers->content_type or return;
say "\nContent-type: $content_type";
my $data = decode_json( $$dataref );
say Dumper( $data );
}
);
$proxy->push_filter(
method => 'GET',
mime => 'application/json',
response => HTTP::Proxy::BodyFilter::complete->new,
response => $filter
);
$proxy->start;
I don't think you need a separate header filter because you can access any headers you want to look at using $message->headers in the body filter.
You'll note that I pushed two filters onto the pipeline. The first one is of type HTTP::Proxy::BodyFilter::complete and its job is to collect up the chunks of response and ensure that the real filter that follows always gets a complete message in $dataref. However foreach chunk that's received and buffered, the following filter will be called and passed an empty $dataref. My filter ignores these by returning early.
I also set up the filter pipeline to ignore everything except GET requests that resulted in JSON responses - since these seem to be the most interesting.
Thanks for asking this question - it was an interesting little problem and you seemed to have done most of the hard work already.

Set the mime parameter, default is to filter text types only.
$proxy->push_filter(response => $hfilter, mime => 'application/json');
$proxy->push_filter(response => $bfilter, mime => 'application/json');

Related

Reuse LWP Useragent object with HTTP POST query in a for/while loop

I am using LWP Useragent to make multiple POST calls with basic Authorization, wherein POST URL parameters are read from a CSV file. Here is my code:
use strict;
use warnings;
use LWP::UserAgent;
use JSON 'from_json';
use MIME::Base64 'encode_base64';
use Data::Dumper;
my #assets;
my %data;
my $response;
my $csvfile = 'ScrappedData_Coins.csv';
my $dir = "CurrencyImages";
open (my $csv, '<', "$dir/$csvfile") || die "cant open";
foreach (<$csv>) {
chomp;
my #currencyfields = split(/\,/);
push(#assets, \#currencyfields);
}
close $csv;
my $url = 'https://example.org/objects?';
my %options = (
"username" => 'API KEY',
"password" => '' ); # Password field is left blank
my $ua = LWP::UserAgent->new(keep_alive=>1);
$ua->agent("MyApp/0.1");
$ua->default_header(
Authorization => 'Basic '. encode_base64( $options{username} . ':' . $options{password} )
);
my $count =0;
foreach my $row (#cryptoassets) {
$response = $ua->post(
$url,
Content_Type => 'multipart/form-data',
Content => {
'name'=>${$row}[1],
'lang' => 'en',
'description' => ${$row}[6],
'parents[0][Objects][id]' => 42100,
'Objects[imageFiles][0]' =>[${$row}[4]],
}
);
if ( $response->is_success ) {
my $json = eval { from_json( $response->decoded_content ) };
print Dumper $json;
}
else {
$response->status_line;
print $response;
}
}
sleep(2);
}
Basically, I want to reuse the LWP object. For this, I am creating the LWP object, its headers, and response objects once with the option of keep_alive true, so that connection is kept open between server and client. However, the response from the server is not what I want to achieve. One parameter value ('parents[0][Objects][id]' => 42100) seems to not get passed to the server in HTTP POST calls. In fact, its behavior is random, sometimes the parentID object value is passed, and sometimes not, while all other param values are passing correctly. Is this a problem due to the reusing of the LWP agent object or is there some other problem? Because when I make a single HTTP POST call, all the param values are passed correctly, which is not the case when doing it in a loop. I want to make 50+ POST calls.
Reusing the user-agent object would not be my first suspicion.
Mojo::UserAgent returns a complete transaction object when you make a request. It's easy for me to inspect the request even after I've sent it. It's one of the huge benefits that always annoyed my about LWP. You can do it, but you have to break down the work to form the request first.
In this case, create the query hash first, then look at it before you send it off. Does it have everything that you expect?
Then, look at the request. Does the request match the hash you just gave it?
Also, when does it go wrong? Is the first request okay but the second fails, or several are okay then one fails?
Instead of testing against your live system, you might try httpbin.org. You can send it requests in various ways
use Mojo::UserAgent;
use Mojo::Util qw(dumper);
my $hash = { ... };
say dumper( $hash );
my $ua = Mojo::UserAgent->new;
$ua->on( prepare => sub { ... } ); # add default headers, etc
my $tx = $ua->post( $url, form => $hash );
say "Request: " . $tx->req->to_string;
I found the solution myself. I was passing form parameter data (key/value pairs) using hashref to POST method. I changed it to arrayref and the problem was solved. I read how to pass data to POST method on CPAN page. Thus, reusing LWP object is not an issue as pointed out by #brian d foy.
CPAN HTTP LWP::UserAgent API
CPAN HTTP Request Common API

Perl REST::Client best way to get and use CSRFToken and session id through cookies

So right now I'm using REST::Client which does a perfectly good job when it comes to making GET requests and fetching JSON data. However, the API in question, when issuing a POST request, should pass a CSRF token and session id and then if entering the right credentials through JSON should then be used further on for all POST requests.
Thing is, I see no way to fetch the cookie using REST::CLient, so I tried LWP, I was able to execute a JSON request, set up the cookie settings and still no cookies.
I tried storing it in a file, tried in a variable, still nothing
$mech->cookie_jar($cookies);
So how do I get these cookies?
P.S I'm sure the request is executed, since I'm seeing the right output and I'm seeing the cookies with a third party rest client.
EDIT:
#!/usr/bin/perl
use REST::Client;
use JSON;
use Data::Dumper;
use MIME::Base64;
use 5.010;
use LWP::UserAgent;
use HTTP::Cookies;
my $first = $ARGV[0];
my $username = 'user#user.com';
my $password = 'password';
my $cookies = HTTP::Cookies->new();
my $ua = LWP::UserAgent->new( cookie_jar => $cookies );
my $headers = {Content-type => 'application/json'};
my $client = REST::Client->new( { useragent => $ua });
my $res = $client->POST('https://URL/action/?do=login',
'{"username": "user#user.com", "password":"password"}', {"Content-type" => 'application/json'});
chkerr($client->responseCode());
print $client->responseContent();
#print $client->responseHeaders();
#$cookies->extract_cookies($res);
print "\n" . $cookies->as_string;
sub chkerr {
my $res = shift;
if($res eq '200') {
print "Success\n";
} else { print "API Call failed: $res\n";
#exit(1);
}
}
The code is really dirty since I've tried about 50 different things now.
The output is as follows:
Success
true -> this indicated that login is successful
Set-Cookie3: __cfduid=d3507306fc7b69798730649577c267a2b1369379851; path="/"; domain=.domain.com; path_spec; expires="2019-12-23 23:50:00Z"; version=0
From the documentation, it seems that REST::Client is using LWP::UserAgent internally. By default, LWP::UserAgent ignores cookies unless you set its cookie_jar attribute.
So you could do something like this (untested code):
my $ua = LWP::UserAgent->new( cookie_jar => {} );
my $rc = REST::Client->new( { useragent => $ua } );
While innaM's answer will work this seems more straight forward to me:
$client->getUseragent()->cookie_jar({}); # empty internal cookie jar
This assumes you already have a REST::Client object in $client.

Get raw response headers from LWP?

Is there a way to grab raw, unmodified response headers from an HTTP request made with LWP? This is for a diagnostic tool that needs to identify problems with possibly malformed headers.
The closest thing I've found is:
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = $ua->get("http://somedomain.com");
print $response->headers()->as_string();
But this actually parses the headers, and then reconstructs a canonicalized, cleaned-up version of them from the parsed data. I really need the entire header text in exactly the form in which it was returned by the server, so anything malformed or non-standard will be clearly identifiable.
If it turns out there is no way to do this with LWP, is there perhaps some other Perl module that can do this?
Net::HTTP provides lower level access with less processing. Since it is a subclass of IO::Socket::INET you can read directly from the object after making the request.
use Net::HTTP;
# Make the request using Net::HTTP.
my $s = Net::HTTP->new(Host => "www.perl.com") || die $#;
$s->write_request(GET => "/", 'User-Agent' => "Mozilla/5.0");
# Read the raw headers.
my #headers;
while(my $line = <$s>) {
# Headers are done on a blank line.
last unless $line =~ /\S/;
push #headers, $line;
}
print #headers;
Based on an inspection of the HTTP::Response object (and the HTTP::Headers object it contains), the headers are discarded as they're parsed.
I would recommend you try WWW::Curl instead.
EDIT Snippet using WWW::Curl:
use WWW::Curl::Easy;
my ($header, $body);
my $curl = WWW::Curl::Easy->new;
$curl->setopt(CURLOPT_URL, $url_to_get); # get this URL
$curl->setopt(CURLOPT_WRITEHEADER, \$header); # save header text in this var
$curl->setopt(CURLOPT_WRITEDATA, \$body); # save body text in this var
my $code = $curl->perform;
if (0 == $code) {
# header text is in $header, body text in $body
} else {
print $curl->strerror($code).": ".$curl->errbuf."\n";
}

Extracting cookies from a Mojolicious user agent response

I started using the Mojolicious library for testing and everything was working fine in until I tried to extract cookies from a response.
I've tried several variants of:
$ua = Mojo::UserAgent->new();
$ua->on( error => sub { my ($ua, $error) = #_; say "This looks bad: $error"; } );
$ua->max_redirects(1)->connect_timeout(10)->request_timeout(20);
$ua->cookie_jar(Mojo::CookieJar->new);
# ... later ...
my $tx = $ua->get($url);
my $jar = $ua->cookie_jar->extract($tx); # This is undef
I can however extract the cookies via LWP::UserAgent. However, LWP has several different issues that make that option unworkable for now. Just for a comparison here is the LWP code that does extract the cookies.
my $lwp = LWP::UserAgent->new(cookie_jar => {}, timeout => 20, max_redirect => 1);
push #{ $lwp->requests_redirectable }, 'POST';
my $response = $lwp->get($url);
die $response->status_line unless $response->is_success;
$lwp->cookie_jar->scan(\&ScanCookies);
sub ScanCookies {
my ($version, $key, $value) = #_;
say "$key = $value";
}
So I know that I have the $url etc. correct.
Edit: I should mention that i'm using strawberry 5.14
Edit2: I should also mention that the cookies are getting into the user agent for sure, because the session ID is getting handled properly. Unfortunately, I have a need to access another cookie (for testing the site) and I just don't seem to be able the get the right incantation to access them... saying that I believe this to be a programmer problem and nothing more.
Use this:
$tx->res->cookies

Why does CGI::Session new and load fail ( couldn't thaw() )?

I tried using the CGI::Session Library but for some reason my code won't keep a persistant session ... this is using Perl Moose for OOP, and is using Moose builders to instantiate the _cgi and _sss (session) parameters of a My::Session object...
UPDATED CODE
My::Role::PersistantData
package My::Role::PersistsData;
use Moose::Role;
use namespace::autoclean;
has '_cgi' => (
is => 'rw',
isa => 'Maybe[CGI]',
builder => '_build_cgi'
);
has '_sss' => (
is => 'rw',
isa => 'Maybe[CGI::Session]',
builder => '_build_sss'
);
My::Session
package My::Session;
use Moose;
use namespace::autoclean;
with 'My::Role::PersistsData';
use CGI;
use CGI::Session ('-ip_match');
use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
sub start{
my($self) = #_;
my $cgi = $self->cgi();
$self->log("Session Started!");
}
sub cgi{
my($self) = #_;
$self->_cgi = $self->_build_cgi() unless $self->_cgi;
return ($self->_cgi);
}
sub _build_cgi{
my($self) = #_;
my $cgi = CGI->new();
if(!$cgi){
#print "mising cgi";
}
return ( $cgi );
}
sub _build_sss{
my($self) = #_;
my $cgi = $self->cgi();
my $sid = $cgi->cookie("CGISESSID") || $cgi->param('CGISESSID') || undef;
$self->log("Session ID Initial is: ".($sid?$sid:"undef"));
my $sss = CGI::Session->new(undef, $cgi, {Directory=>'tmp'}) or die CGI::Session->errstr;
my $cookie = $cgi->cookie(CGISESSID => $sss->id() );
$self->log("Resulting Session ID is: ".$sid." cookie is: ".$cookie);
print $cgi->header( -cookie=>$cookie );
return ( $sss );
}
main.pl
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use My::Session;
$| = 1;
$, = " ";
$\ = "\n <br />";
my $sss = My::Session->new();
$sss->start();
print Dumper($sss);
It's pretty weird because the first time I run this I get an actual CGISESSION ID and I am able to carry it over on a page refresh...
however if I load the page again, suddenly the $sss (session) comes back as undefined, when it should return a new Session object:
$sss = new CGI::Session("driver:File", $sid, {Directory=>'/tmp'})
for some reason $sss is coming back as undefined, which means it didnt initiate a new Session. A few tweaks to my code revealed this error:
new(): failed: load(): couldn't thaw() data using CGI::Session::Serialize::default:thaw(): couldn't thaw. syntax error at (eval 253) line 2, near "/>"
I also snooped around in CGI::Session.pm and found where this error was being thrown at, I guess it's not able to parse _DATA or even read it...because of some strange characters... "/>"
CGI::Session.pm
....
$self->{_DATA} = $self->{_OBJECTS}->{serializer}->thaw($raw_data);
unless ( defined $self->{_DATA} ) {
#die $raw_data . "\n";
return $self->set_error( "load(): couldn't thaw() data using $self->{_OBJECTS}->{serializer} :" .
$self->{_OBJECTS}->{serializer}->errstr );
}
Any idea why this isnt working?
Most likely this is due to a different session cookie being sent (been there, hit that wall with head. HARD).
Please print the session cookie value being used to store the session initially as well as session cookie value supplied by subsequent request.
If they are indeed different, you have 2 options:
Investigate why different session cookie is sent by the browser in subsequent requests and fix that issue somehow.
I was never able to find conclusive answer but my app consisted of a frame with internal <iframe> so I suspect it was due to that.
If like me you can't find the root cause, you can also work around this.
My workaround: explicitly STORING the original session cookie value as a form variable being passed around 100% of your code pieces.
Then re-initialize session object with correct cookie value before your server side code requests session data.
Not very secure, annoying, hard to get right. But works. I wouldn't recommend it except as a uber-last-resort hack
Perhaps you could try (or at least look at the code to see how it works) for some stateful webapp module. I have used Continuity, very cool stuff.
For some reason you can't use Data::Dumper or other HTML tags with CGI::Session
Answer found here and here
Removing Dumper and HTML output fixed this problem -- kind of --
updated
Apparently you have to use escapes
$cgi->escapeHTML ( Dumper($session) );
and that FINALLY resolves this problem.
Perl is a pain!