Cannot login with UserAgent - perl

I have managed to login with the code below. Now I can do it ony once a day.
And then I cant login, but get the login page in the response.
But when i print $reqstr from the code below and paste it to browser(like firefox), I can log in.
Wget doesnt work neiter. Only normal browser.
Soemtimes it seems , that Im logged in, but only get such content:
"<html>\cJ<head>\cJ\cI<meta http-equiv=\"content-type\" content=\"text/html; charset=ISO-8859-1\"><meta http-equiv=\"expires\" content=\"0\"><meta http-equiv=\"pragma\" content=\"no-cache\">\cJ\cI<meta http-equiv=\"refresh\" content=\"0; URL='https://www.address.com/'\">\cJ</head>\cJ</html>\cJ"
I also noticed, that while I cant login, Im getting this part in a debugger:
_uri_canonical' => URI::https=SCALAR(0x17dad28)
-> REUSED_ADDRESS
'handlers' => HASH(0x22dc0c0)
'response_data' => ARRAY(0x22ee8b8)
0 HASH(0x22d9a48)
'callback' => CODE(0x22dba30)
-> &LWP::UserAgent::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:682] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:679-682
1 HASH(0x22eea08)
'callback' => CODE(0x22d9cb8)
-> &LWP::Protocol::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:138] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:135-138
Any clue?
Here the code:
my $b = LWP::UserAgent->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/31.2.0',);
my $cookie_jar = HTTP::Cookies->new(
file => 'lwp_cookies.txt',
autosave => 1,
ignore_discard => 1,
);
$cookie_jar->clear;
$cookie_jar->clear_temporary_cookies;
$b->cookie_jar($cookie_jar);
my $url = "https://www.address.com";
my $r = $b->get($url);
$r->decoded_content =~ /FORM ACTION="(.*?)" METHOD/msgi;
my $a = "$url$1";
print $a."\n";
my $reqstr = $a."&LoginAction=Login&Number=55555&KPassword=passw&UserID=uid";
my $req = HTTP::Request->new(POST => $reqstr);
$req->header('Host', 'www.address.com');
$req->header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$req->header('Connection', 'keep-alive');
$req->header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8');
my $c = $b->request($req);

You need to re-request that page with the referrer added via referer() for LWP::UserAgent (or see my second answer if you aren't wedded to that module)
sub login { # Code not tested and not really compilable, just a stub for you
my (#other_args, $url, $referrer_url) = #_;
# Add your login code from the question, up to calling $b->request()
$req->referer($referrer_url) if $referrer_url;
my $c = $b->request($req);
return $c; # Or return the response?
}
my $result1 = login($original_login_url); #first try
# Obtain the redirect_url from the response.
# If it was a 301 redirect, you can do it via
# my #redirects = $response->redirects();
my $referrer_url = $original_login_url;
my $result2 = login($redirect_url, $referrer_url);
References:
http://forums.devshed.com/perl-programming-6/lwp-meta-refresh-tag-handling-63484.html
http://www.herongyang.com/Perl/LWP-UserAgent-Follow-HTTP-Redirects.html

If you aren't dead set on using LWP::UserAgent, use WWW::Mechanize instead.
Best approach: use WWW::Mechanize::Plugin::FollowMetaRedirect. The SYNOPSIS is pretty short and to the point:
use WWW::Mechanize;
use WWW::Mechanize::Plugin::FollowMetaRedirect;
my $mech = WWW::Mechanize->new;
$mech->get( $url );
$mech->follow_meta_redirect;
# Optionally, skip emulating the waiting time
$mech->follow_meta_redirect( ignore_wait => 1 );
If you don't have access to that module, you can create your own, similar to this: http://www.perlmonks.org/?node_id=487286
(Basically, parse the returned content using the regex shudder to extract the refresh URL, and get that URL. As per my other answer, you might need to add the referrer header)

Related

Unable to pass custom header using perl module HTTP::Request::Generator

I'm using atom and testing out HTTP::Request::Generator PERL module. Code , below works on most part but I'm unable to send cookies or headers, it only displays default headers even when I have set in my code.
use strict;
use warnings;
use HTTP::Request::Generator 'generate_requests';
use LWP::UserAgent;
my $ua = 'LWP::UserAgent'->new;
my $gen = generate_requests(
method => 'GET',
host => [ 'https://abc.ai/' ],
pattern => 'https://abc.ai',
headers => {
"User-Agent" => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64',
"Cookie" => '_abc',
},
wrap => sub {
my ( $req ) = #_;
# Fix up some values
$req->{'headers'}{'Content-Length'} = 666;
},
wrap => \&HTTP::Request::Generator::as_http_request,
);
while ( my $req = $gen->() ) {
my $response = $ua->request( $req );
# print $response->protocol, ' ', $response->status_line, "\n";
print $req->headers->as_string, "\n";
print $req->as_string();
# Do something with $response here?
if ($response->is_success) {
# print $response->decoded_content;
print $response ->header('title');
}
else {
die $response->status_line;
}
}
Output
User-Agent: libwww-perl/6.31
Login
The title page indicate me I'm not logged in this cookie is fine and i have tested it using curl i can manually login and retrieve required resource. Why its failing for perl, how can access my header options in code above. Thanks.
Solution
body_params => {
comment => ['Some comment', 'Another comment, A++'],
},
Got it solved adding above code.
You can't provide the same option (wrap) twice:
wrap => sub {
my ( $req ) = #_;
# Fix up some values
$req->{'headers'}{'Content-Length'} = 666;
},
wrap => \&HTTP::Request::Generator::as_http_request,
This may work though:
wrap => sub {
my ( $req ) = #_;
# Fix up some values
$req->{'headers'}{'Content-Length'} = 666;
return HTTP::Request::Generator::as_http_request( $req );
},
Also the headers option appears to take an arrayref of hashrefs, like this:
headers => [
{
"User-Agent" => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
"Cookie" => '_abc',
},
],
I guess the reason for that is so you can provide alternative sets of headers:
headers => [
{
"User-Agent" => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
"Cookie" => '_abc',
},
{
"User-Agent" => 'Mozilla/1.0 (Hoover Vacuum Cleaner)',
"Cookie" => '_def',
},
],
That way your request generator can generate two requests for each page, using different User-Agent strings, or different cookies (so logged in as different users), or different Accept headers, or whatever.

Read a web page with Perl

I am trying to read the content of a web page with perl on Windows 10. The code does not work for the following site:
https://www.dividendinvestor.com/dividend-quote/intc/
Here is the code I am using:
use LWP::Simple qw(get);
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
Any idea why I cannot read that page?
LWP::Simple is pretty basic and doesn't let you do anything clever like actually looking at the details of the response. So let's change to LWP::UserAgent and see what the response is.
use LWP::UserAgent;
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
my $resp = $ua->get($url);
print $resp->status_line;
This prints:
403 Forbidden
So I think that Quentin's comment is correct and that the site's owners are blocking people who use technology like LWP.
So let's change the useragent string to look like Internet Explorer.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
print $resp->status_line;
Now I get:
200 OK
So we should be ok to get the content.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
if ($resp->is_success) {
print $resp->content;
} else {
print $resp->status_line;
}
And that seems to work fine.
Note: Of course, changing the useragent string like this is rather dishonest. Presumably, the site's owners have a good reason for wanting to dissuade people from accessing their site in this way. So don't annoy them by trying to get around their restrictions. Read the site'sterms of service to see what they want to to do. Perhaps they have an API available that will give you the data you want.
As Dave Cross wrote, the problem is related to the user agent. It is possible to use the LWP::Simple module in this way:
use LWP::Simple qw/$ua get/;
$ua->agent('Mozilla/5.0');
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
As the documentation points, the user agent created by this module (LWP::Simple) will identify itself as "LWP::Simple/#.##". So we can change it before the "GET" request.

Can't use concurrent ascynrounous URLs with Net::Async::HTTP .. It quits and doesn't goto the next URL

Using the concurrent asynchronous URL example for Net::Async::HTTP, the first encounter of bad URL (timeout, doesn't exist, etc) error causes the program to fail and exit completely, without continuing to the next URL in the array. Is the problem my code or the module?
I tried setting fail_on_error to 0, and even 1, but it had no obvious results.
#!/bin/perl
use IO::Async::Loop;
use Net::Async::HTTP;
use Future::Utils qw(fmap_void);
use strict;
use warnings;
use feature 'say';
my $ua_string = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36";
my $timeout = 10;
my $max_redirects = 10;
my $max_in_flight = 10;
my $max_connections_per_host = 10;
my $stall_timeout = 10;
my $max_recurse = "10";
my $max_per_host = "10";
my #URLs = ( "http://cnn.com", "http://google.com", "http://sdfsdfsdf24.com", "http://msn.net" );
my $loop = IO::Async::Loop->new();
my $http = Net::Async::HTTP->new();
$loop->add($http);
my $future = fmap_void {
my ( $url ) = #_;
$http->configure(user_agent => $ua_string);
$http->configure(timeout => $timeout );
$http->configure(max_redirects => $max_redirects);
$http->configure(max_in_flight => $max_in_flight);
$http->configure(max_connections_per_host => $max_connections_per_host);
$http->configure(stall_timeout => $stall_timeout);
$http->configure(fail_on_error => '0' );
$http->GET($url)->on_done(
sub {
my $response = shift;
say "Response: $response->code";
}
)->on_fail(
sub {
my $fail = shift;
say "Failed: $fail";
}
);
}
foreach => \#URLs;
$loop->await($future);
Your example really works well without any proxy, I tested and did some changes:
Fetching URL: '. $url;
$http->GET($url)->on_done(
sub {
my $response = shift;
say "Response: ".$response->code();
}
)->on_fail(
sub {
my $fail = shift;
say "Failed: " . $fail;
}
);
Output:
Fetching URL: http://cnn.com
Response: 200
Fetching URL: http://google.com
Response: 302
Fetching URL: http://sdfsdfsdf24.com
Response: 403
Fetching URL: http://msn.net
Response: 200
As this example is not doing async call's the URL's are on a queue and being processed one by one.
Behind the scenes when you are doing a request to a target in your case to some URL's, at the low level the connection is made through a socket connection.
If you have a proxy which is not configured between your script and the intenet, there is no connection and it will raise an exception and your script will die like:
Fetching URL: http://cnn.com
Failed: Timed out
The variable $! is set and the error "Operation now in progress" appears, in fact your request didn't established any connection it just tried to establish one without success.
There are some points which you can check for example:
1 - Is the proxy working ?
2 - Do I have internet connection ?
3 - Is the URL I am testing working ?
If you are having problems with proxy, your script need a small adjust that you can get more info in the docs:
$http->configure( proxy_host => 'xx.xx.xx.xx');
$http->configure( proxy_port => 1234);
Supposing that your proxy is configured, you can check if you have fully access to the internet and aim some target like that URL's.
Trying to access the URLs it will provide you a response code and depending on the code you can do something.
As an alternative solution you could use LWP::UserAgent to make simple requests and check the response code.
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
my $response = $ua->get('http://search.cpan.org/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}
And even with some bad stats like 4XX for example Net::Async::HTTP won't be friendly to use this module for a simple purpose as it can't handle the exceptions like you want.

WWW::Mechanize ssl cross-site login failure cookie_jar not populating

The URL $url, redirects to https://auth.outside.com/secure/login for authentication over SSL. The site stores some cookie, as soon as you land on the page, and also some on successful authentication. However, I am not getting the cookie file populated, even when i manage to land on the page. this is an example with google, but real URL is different.
CODE
#!/usr/bin/perl
use warnings;
use strict;
use WWW::Mechanize;
use Crypt::SSLeay;
use HTTP::Cookies;
my $userAgent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0';
my $cookie_file = 'auth_cookies.txt';
$ENV{HTTPS_PROXY} = 'http://myproxy.net:8080';
my $google='https://www.google.com';
my $url = $google;
my $tempfile='download_details';
my $mech = WWW::Mechanize->new(
noproxy => 0,
agent => $userAgent,
cookie_jar => HTTP::Cookies->new( file => $cookie_file )
);
my $result=$mech->get( $url, ':content_file' => $tempfile );
print sprintf( "User-Agent %s\n redirects to: %s\n\n", $userAgent, $mech->uri() );
print "result=$result\n";
outputs following:
User-Agent Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0
redirects to: https://www.google.com
result=HTTP::Response=HASH(0x3474ef0)
but does not create any cookie file even thou i can see a bunch of cookies in firebug.
after adding this code, the file is populating...
$mech->cookie_jar->set_cookie(
qw(
3
cat
buster
/
.example.com
0
0
0
)
);

Reading Firefox cookie using LWP

I was trying to eliminate the logging in process to a website by reading the browser cookies (which I created by logging in using Firefox earlier). I exported it from Firefox using this Firefox addon. It gives a 200 OK response but returns the generic homepage instead of my custom 'logged in' home page. How do I make sure that cookie is passed to the server properly ?
#!/usr/bin/perl
use strict ;
use warnings;
use LWP::UserAgent;
use HTTP::Cookies::Netscape;
my #GHeader = (
'User-Agent' => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.19) Gecko/2010040200 Ubuntu/8.04 (hardy) Firefox/3.0.19',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-us,en;q=0.5',
'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Encoding' => 'gzip,deflate',
'Keep-Alive' => '300',
'Connection' => 'keep-alive'
);
my $cookie_jar = HTTP::Cookies::Netscape->new(
file => "cookies.txt",
);
my $Browser = LWP::UserAgent->new;
$Browser->cookie_jar( $cookie_jar );
my ($OutLine,$response)=();
my $URL = 'http://www.hanggliding.org/';
printf("Get [%s]\n",$URL);
$response = $Browser->get($URL,#GHeader);
if($response->is_success)
{
if($response->status_line ne "200 OK")
{
printf("%s\n", $response->status_line);
}
else
{
printf("%s\n", $response->status_line);
$OutLine =$response->decoded_content;
open(HTML,">out.html");printf HTML ("%s",$OutLine);close(HTML);
}
}
else
{
printf("Failed to get url [%s]\n", $response->status_line);
}
You can inject a handler to access or modify request/response data during processing.
Quoting LWP::UserAgent's docs:
Handlers are code that injected at various phases during the processing of requests. The following methods are provided to manage the active handlers:
$ua->add_handler( $phase => \&cb, %matchspec )
Add handler to be invoked in the given processing phase. For how to specify %matchspec see "Matching" in HTTP::Config.
...
request_send => sub { my($request, $ua, $h) = #_; ... }
This handler gets a chance of handling requests before they're sent to the protocol handlers. It should return an HTTP::Response object if it wishes to terminate the processing; otherwise it should return nothing.
From there, you can inject a handler which will analyze the request object, but otherwise do nothing:
use LWP::UserAgent;
use Data::Dumper;
sub dump_request {
my ($request, $ua, $h) = #_;
print Dumper($request);
return undef;
}
my $browser = LWP::UserAgent->new;
$browser->add_handler(
request_send => \&dump_request,
m_method => 'GET'
);
$browser->get('http://www.google.com');