What is wrong in my usage of cookies with LWP::UserAgent? - perl

With a real Firefox browser, my authentication through 'https://www.briefing.com/Login/subscriber.aspx' places a cookie on my hard drive. Afterwards, when I start another session, I can bypass the authentication step and directly reach data page like 'https://www.briefing.com/investor/calendars/earnings/2018/08/09'.
In the code below, I am trying to duplicate the above, but to no avail. What am I doing wrong?
use strict;
use warnings;
use HTTP::Cookies;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
# Let ua use cookies and gives it an empty jar
$ua->cookie_jar(HTTP::Cookies->new());
# login to create cookies
# The field names are guessed, based on these web elements:
# <input name="_textBoxUserName" id="_textBoxUserName" type="text">
# <input name="_textBoxPassword" id="_textBoxPassword" type="password">
my $url = 'https://www.briefing.com/Login/subscriber.aspx';
my $res = $ua->post($url,
'_textBoxUserName' => '*******',
'_textBoxPassword' => '*******');
# confirm that new cookies are indeed placed in the cookie jar
print join("\n", $ua->cookie_jar()->as_string);
# But URL still gets re-directed even though the cookie jar has been filled
$url = 'https://www.briefing.com/investor/calendars/earnings/2018/08/09';
$res = $ua->get($url);
print $res->content;

Related

Save cookies after post method, perl, mechanize, redirect

I want to login via perl script to http://www.tennisinsight.com/match_previews.htm and download the page. I am stuck at login into the site via the script
1 The site uses cookies to store login data.
2 The login form is triggered by javascript, but that is not important, because a simple web page on local that containts only:
<form action="http://www.tennisinsight.com/myTI.php" method="POST">
<input name="username" type="text" size="25" />
<input name="password" type="password" size="25" />
<input name="mySubmit" type="submit" value="Submit!" />
</form>
given the right username and pass will send the needed data, and the site will redirect to main page, user logged, and cookies are created. In short, a simple post with correct data does it all on the client side.
3.I have successfully tried and fetched the page I need with curl, once the correct cookies were provided.
I think that posting to myTI.php, storing the returned cookies and then opening the correct page while reading the cookies will do the trick, but I am failing at the save cookies part....
Here is the script I try to get cookies, it prints them in stdout at the moment
use warnings;
use HTML::Tree;
use LWP::Simple;
use WWW::Mechanize;
use HTTP::Request::Common;
use Data::Dumper;
my $username = "user";
my $password = "pass";
my $site_url = 'http://www.tennisinsight.com/myTI.php';
my $mech = WWW::Mechanize->new( autocheck => 1 );
# print $mech->content;
my $response = $mech->post($site_url ,
[
'username' => $username,
'password' => $password,
]) ;
my $cookie_jar = HTTP::Cookies->new;
$cookie_jar->extract_cookies( $response );
print $cookie_jar;
EDIT:
I have found the examples how to store cookies, the problem is that I get an empty file ( or empty stdout... It seems the called php will redirect before the cookies are stored and login will fail
I am sorry, but I am new to perl in general, seems I am missing something
I had the same problem, what looked like Mechanize not passing cookies between requests. To debug, I had Mechanize write the cookies to disk
my $mech = WWW::Mechanize->new(file => "/path/to/cookies");
When I did that, I got a file with this as the contents (i.e. the "empty" cookies file):
#LWP-Cookies-1.0
As gangabass suggested, I changed my user agent
$mech->agent_alias('Linux Mozilla');
and then cookies started appearing in the file and they were passed between subsequent requests. Problem solved.
It was the agent_alias call that fixed it, not writing the cookies to disk.
According to the HTTP::Cookies module documentation you can provide the following arguments to the constructor in order for the cookies to be stored on disk.
file => "/path/to/cookies"
autosave => 1
The method load is also present in order to load the cookies back from disk.

Processing external page with perl CGI or act as a reverse proxy

There is a page residing on a local server running Apache. I would like to submit the form via a GET request with a single name/value pair, like:
id=item1234
This GET request has to be processed by another server which I don't have control over subsequently returning a page which I would like to transform with the CGI script. In other words:
User submits form
MY apache proxies to external resource
EXTERNAL resource throws back a page
MY apache transforms it with a CGI (maybe another way?)
User get a modified page
Again this more like an architectural question so I'd be grateful for any hints, even poking my nose into some guides will help as I wasn't able to structure my google request well enough to locate anything related.
Thanks.
Pass the id "17929632" to this CGI code ("proxy.pl?id=17929632"), and you should this exact page in your browser.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use CGI::Pretty qw(:standard -any -no_xhtml -oldstyle_urls);
print header;
print "<html>\n";
print " <head><title>Proxy Demo</title></head>\n";
print " <body bgcolor=\"white\">\n";
my $id = param('id') || die "No CGI param 'id'\n";
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
# Create a request
my $req = HTTP::Request->new(GET => "http://stackoverflow.com/questions/$id");
# Pass request to the user agent and get a response back
my $response = $ua->request($req);
# Check the outcome of the response
if ($response->is_success) {
my $content = $response->content;
# Modify the original content here!
print $content;
}
else {
print $response->status_line;
}
print "</body></html>\n";
Vague question, vague answer: write your CGI program to include a HTTP user agent, e.g. LWP.

Loggin into website with LWP and Perl

Somewhat inexperienced programmer here trying to write a program to log into my courses site and download all the content (lectures homeworks etc). Obviously it is a password protected site so I have to give it that. I understand LWP::UserAgent and the likes well enough, and that I need to use credentials. What I cannot figure out is how to get to the next page. I can go to the log-in, but how does perl get the result of my log-in?
code example (i pulled out the log-info obviously):
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $url = 'login URL';
$ua -> credentials(
$url,
'',
'user',
'pass'
);
my $response = $ua ->get($url);
print $response->content;
the content from response is the same content as what I would have got as if I had not passed any credentials. Obviously I'm missing something here....
Oh one other thing my own courses site does not have a unique url as far as I know.
You probably want to be using WWW::Mechanize, a subclass of LWP::UserAgent designed to act more like a browser, allowing you to navigate through pages of a website with cookie storage already taken care of for you.
You only use credentials if the site uses HTTP basic auth, in which case you don't "log in", you just pass the credentials with every request.
If the site has a form based login system, then you need to use cookie_jar and request the form's action URI with whatever data it expects.
#!/usr/bin/perl
use LWP::UserAgent;
use HTTP::Cookies;
my $ua=LWP::UserAgent->new(timeout => 20);
$ua->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1.8) Gecko/20100202 MRA 5.5 (build 02842) Firefox/3.5.8');
$ua->requests_redirectable(0);
my $cook = HTTP::Cookies->new;
$ua->cookie_jar($cook);
print = requester('http://urlexample/login.php', 'login=yourlogin&password=pass' )->as_string;
sub requester
{
my $type = 'GET';
if($_[1]){$type = 'POST'}
my $req = HTTP::Request->new($type => $_[0]);
$req->content_type('application/x-www-form-urlencoded; charset=UTF-8');
if($_[1]){$req->content($_[1])}
my $res = $ua->request($req);
return $res;
}

Get files from a given URL on the basis of pattern passed using Perl on Unix

I have been told that a given URL contains several xml and text files and I need to download all the xml files starting with AAA(that is AAA*.xml) inside a given directory.
Credentials to access that URL are provided to me.
Please not that size of xml files could be in GBs.
I have used below code to achieve the same-
use strict;
use warnings;
use LWP;
my $browser = LWP::UserAgent->new;
my $username ='scott';
my $password='tiger';
# Create HTTP request object
my $req = HTTP::Request->new( GET => "https://url.com/");
# Authenticate the user
$req->authorization_basic( $username , $password);
my $res = $browser->request( $req , ':content_file' => '/fold/AAA1.xml');
print $res->status_line, "\n";
It prints 200 OK status but I am not able to get the file. Any suggestions?
Man
If the server doesn't allow you to receive a folder list (i.e. Apache without "Options +Indexes"), you will not GET the collection of files.
But, having the list, you can filter it with a regexpr like /AAA.*/, and with LWP::Simple module, it's easy to get it

WWW:Mechanize Form Select

I am attempting to login to Youtube with WWW:Mechanize and use forms() to print out all the forms on the page after logging in. My script is logging in successfully, and also successfully navigating to Youtube.com/inbox; However, for some reason Mechanize can not see any forms at Youtube.com/inbox. It just returns blank. Here is my code:
#!"C:\Perl64\bin\perl.exe" -T
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
use WWW::Mechanize;
use Data::Dumper;
my $q = CGI->new;
$q->header();
my $url = 'https://www.google.com/accounts/ServiceLogin?uilel=3&service=youtube&passive=true&continue=http://www.youtube.com/signin%3Faction_handle_signin%3Dtrue%26nomobiletemp%3D1%26hl%3Den_US%26next%3D%252Findex&hl=en_US&ltmpl=sso';
my $mechanize = WWW::Mechanize->new(autocheck => 1);
$mechanize->agent_alias( 'Windows Mozilla' );
$mechanize->get($url);
$mechanize->submit_form(
form_id => 'gaia_loginform',
fields => { Email => 'myemail',Passwd => 'mypassword' },
);
die unless ($mechanize->success);
$url = 'http://www.youtube.com/inbox';
$mechanize->get($url);
$mechanize->form_id('comeposeform');
my $page = $mechanize->content();
print Dumper($mechanize->forms());
Mechanize is unable to see any forms at youtube.com/inbox, however, like I said, I can print all of the forms from the initial link, no matter what I change it to...
Thanks in advance.
As always, one of the best debugging approaches is to print what you get and check if it is what you were expecting. This applies to your problem too.
In your case, if you print $mechanize->content() you'll see that you didn't get the page you're expecting. YouTube wants you to follow a JavaScript redirect in order to complete your cross-domain login action. You have multiple options here:
parse the returned content manually – i.e. /location\.replace\("(.+?)"/
try to have your code parse JavaScript (have a look at WWW::Scripter)
[recommended] use YouTube API for managing your inbox