The post request is as follows.
$bot->add_header(
'Host'=>'www.amazon.com',
'User-Agent'=>'application/json, text/javascript, */*',
'Accept'=>'application/json, text/javascript, */*',
'Accept Language'=>'en-us,en;q=0.5',
'Accept Encoding'=>'gzip, deflate',
'DNT'=>'1',
'Connection'=>'keep-alive',
'Content type'=>'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested with'=>'XMLHttpRequest',
'Referer'=>'https://www.amazon.com/gp/digital/fiona/manage?ie=UTF8&ref_=gno_yam_myk',
'Content length'=>'44',
'Cookie'=>'how do i put the cookie value');
Post parameters in my request :
sid-how do i get the session id.
new email-mailhost#mail.com
My code to logon:
use WWW::Mechanize;
use HTTP::Cookies;
use HTML::Form;
use WWW::Mechanize::Link;
my $bot = WWW::Mechanize->new();
$bot->agent_alias( 'Linux Mozilla' );
# Create a cookie jar for the login credentials
$bot->cookie_jar( HTTP::Cookies->new( file => "cookies.txt",
autosave => 1,
ignore_discard => 1, ) );
# Connect to the login page
my $response = $bot->get( 'https://www.amazon.com/gp/css/homepage.html/' );
# Get the login form. You might need to change the number.
$bot->form_number(3);
# Enter the login credentials.
$bot->field( email => '' );
$bot->field( password => '' );
$response = $bot->click();
#print $response->decoded_content;
$bot->get( 'https://www.amazon.com/gp/yourstore/home?ie=UTF8&ref_=topnav_ys' );
print $bot->content();
$bot->post('https://www.amazon.com/gp/digital/fiona/du/add-whitelist.html/ref=kinw_myk_wl_add', [sid => 'id', email=> 'v2#d.com']);
Data captured:
Host=www.amazon.com
User-Agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept=application/json, text/javascript, */*
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
DNT=1
Connection=keep-alive
Content-Type=application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With=XMLHttpRequest
Referer=https://www.amazon.com/gp/digital/fiona/manage?ie=UTF8&ref_=gno_yam_myk
Content-Length=39
Cookie=session-id-time=2082787201l; session-id
Pragma=no-cache
Cache-Control=no-cache
POSTDATA=sid=id&email=v%40d.com
Error Message-
Error POSTing https://www.amazon.com/gp/digital/fiona/du/add-whitelist.html/ref=
kinw_myk_wl_add: InternalServerError at logon.pl line 81
See post in WWW::Mechanize.
$bot->post($url, [sid => 'id', email => 'v#d.com']);
Related
I'm writing a simple Perl script that fetches some pages from different sites. It's very non-intrusive. I don't hog a servers bandwidth. It retrieves a single page without loading any extra javascript, or images, or style sheets.
I use LWP::UserAgent to retrieve the pages. This works fine on most sites but there are some sites that return a "403 - Bad Request" error. The same pages load perfectly fine in my browser. I have inspected the request header from my webbrowser and copied that exactly when trying to retrieve the same page in Perl and every single time I get a 403 error. Here's a code snippet:
use strict;
use LWP::UserAgent;
use HTTP::Cookies;
my $URL = "https://www.betsson.com/en/casino/jackpots";
my $browserObj = LWP::UserAgent->new(
ssl_opts => { verify_hostname => 0 }
);
# $browserObj->cookie_jar( {} );
my $cookie_jar = HTTP::Cookies->new();
$browserObj->cookie_jar( $cookie_jar );
$browserObj->agent( "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0");
$browserObj->timeout(600);
push #{ $browserObj->requests_redirectable }, 'POST';
my #header = ( 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate, br',
'Accept-Language' => 'en-US,en;q=0.5',
'Connection' => 'keep-alive',
'DNT' => '1',
'Host' => 'www.bettson.com',
'Upgrade-Insecure-Requests' => '1'
);
my $response = $browserObj->get( $URL, #header );
if( $response->is_success ) {
print "Success!\n";
} else {
print "Unsuccessfull...\n";
}
How do these servers distinguish between a real browser and my script? At first I thought they had some JavaScript trickery going on, but then I realized in order for that to work, the page has to be loaded by a browser first. But I immediately get this 403 Error.
What can I do to debug this?
While 403 is a typical answer for bot detection, in this case the bot detection is not the cause of the problem. Instead a typo in your code is:
my $URL = "https://www.betsson.com/en/casino/jackpots";
...
'Host' => 'www.bettson.com',
In the URL the domain name is www.betsson.com and this should be reflected in the Host header. But your Host header is slightly different: www.bettson.com. Since the Host header has the wrong name the request is rejected with 403 forbidden.
And actually, it is not even needed to go through all this trouble since it looks like no bot detection is done at all. I.e. no need to set user-agent and fiddle with the headers but plain:
my $browserObj = LWP::UserAgent->new();
my $response = $browserObj->get($URL);
I have been scratching my head trying to get LWP and HTTP::Request to actually pass a POST parameter to a web server. The web server can see the fact that the request was a POST transaction, but it is not picking up the passed parameters. I have been searching all day on this and have tried different things and I have yet to find something that works. (The web server is working, I am able to manually send post transactions and when running the whole script, I am getting '200' status but I am not seeing any posted elements. Any help would be appreciated. Tnx.
my $ua2 = LWP::UserAgent->new;
$ua2->agent("Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)");
my $req2 = HTTP::Request->new(POST => "$url", [ frm-advSearch => 'frmadvSearch' ]);
$req2->content_type('text/html');
my $res2 = $ua2->request($req2);
$http_stat = substr($res2->status_line,0,3);
my $res = $ua->post($url,
Content => [
'frm-advSearch' => 'frmadvSearch',
],
);
which is short for
use HTTP::Request::Common qw( POST );
my $req = POST($url,
Content => [
'frm-advSearch' => 'frmadvSearch',
],
);
my $res = $ua->request($req);
Here's a Mojo::UserAgent example, which I find easier to debug:
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
$ua->transactor->name( 'Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)' );
my $url = 'http://www.example.com/form/';
my $tx = $ua->post( $url, form => { 'frm-advSearch' => 'frmadvSearch' } );
say $tx->req->to_string;
The transaction in $tx knows about the request so I can look at that:
POST /form/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)
Accept-Encoding: gzip
Host: www.example.com
Content-Length: 26
frm-advSearch=frmadvSearch
I'm trying to retrieve a page using LWP::UserAgent but I keep getting a "500 Internal Server Error" as a response. Retrieving the exact same page in Firefox (using a fresh "Private Window" - so without any cookies set yet) succeeds without a problem.
I've duplicated the headers exactly as sent by Firefox, but that still does not make a difference. Here's my full code:
use strict;
use LWP::UserAgent;
my $browserObj = LWP::UserAgent->new();
$browserObj->cookie_jar( {} );
$browserObj->timeout(600);
my #header = (
'Host' => 'www.somedomain.com',
'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0',
'Accept-Language' => 'en-US,en;q=0.5',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate, br',
'DNT' => '1',
'Connection' => 'keep-alive',
'Upgrade-Insecure-Requests' => '1'
);
my $URL = "https://www.somedomain.com";
my $response = $browserObj->get( $URL, #header );
if( $response->is_success ) {
print "Success!\n";
} else {
print "Error: " . $response->status_line . ".\n" );
}
The real web address is something other than "www.somedomain.com". In fact, it's a URL to an online casino, but I don't want my question be regarded as spam.
But anyone any idea what could be wrong?
On our corporate network which has a proxy (and an out of date perl version - there may be better options in newer versions) we tend to add the following for one-offs:
BEGIN {
$ENV{HTTPS_DEBUG} = 1; # optional but can help if you get a response
$ENV{HTTPS_PROXY} = 'https://proxy.server.here.net:8080';
}
If we don't do this the script simply fails to connect with no other information.
You may also want to add something like this if you want to inspect the messages:
$browserObj->add_handler("request_send", sub { shift->dump; return });
$browserObj->add_handler("response_done", sub { shift->dump; return });
This is the URL
https://trade.4over.com/orders/ajax/product_run_size.php?id_product=599983
I am trying to store its data using mechanize. It is returning forbidden error and when i am hitting it in browser it is giving response.
I am using WWW::Mechanize module.
Here is the code that I am using
my $mech = new WWW::Mechanize;
$mech->add_header( 'User-agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; nl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13');
$mech -> cookie_jar(HTTP::Cookies->new());
$mech->get($url);
my $result = $mech->submit_form(
form_number => 2,
fields =>
{
username => 'username', # Name of the input field and value
password => 'password',
},
button => 'log_in' # Name of the submit button
);
my $content = encode 'utf8',$mech->decoded_content;
return $content;
Just got the solution. I was doing t wrong.
What I was doing is to submit form on this page while the form is at the home page.
Now i am submitting the form on home page and then using mech->get for this UR.
Its working. Thanks for all your responce.
I once wrote a simple 'crawler' to download http pages for me in JAVA.
Now I'm trying to rewrite to same thing to Perl, using LWP module.
This is my Java code (which works fine):
String referer = "http://example.com";
String url = "http://example.com/something/cgi-bin/something.cgi";
String params= "a=0&b=1";
HttpState initialState = new HttpState();
HttpClient httpclient = new HttpClient();
httpclient.setState(initialState);
httpclient.getParams().setCookiePolicy(CookiePolicy.NETSCAPE);
PostMethod postMethod = new PostMethod(url);
postMethod.addRequestHeader("Referer", referer);
postMethod.addRequestHeader("User-Agent", " Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13");
postMethod.addRequestHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8");
postMethod.addRequestHeader("Content-Type", "application/x-www-form-urlencoded");
String length = String.valueOf(params.length());
postMethod.addRequestHeader("Content-Length", length);
postMethod.setRequestBody(params);
httpclient.executeMethod(postMethod);
And this is the Perl version:
my $referer = "http://example.com/something/cgi-bin/something.cgi?module=A";
my $url = "http://example.com/something/cgi-bin/something.cgi";
my #headers = (
'User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Referer' => $referer,
'Content-Type' => 'application/x-www-form-urlencoded',
);
my #params = (
'a' => '0',
'b' => '1',
);
my $browser = LWP::UserAgent->new( );
$browser->cookie_jar({});
$response = $browser->post($url, #params, #headers);
print $response->content;
The post request executes correctly, but I get another (main) webpage. As if cookies were not working properly...
Any guesses what is wrong?
Why I'm getting different result from JAVA and perl programs?
You can also use WWW::Mechanize, which is a wrapper around LWP::UserAgent. It gives you the cookie jar automatically.
You want to be creating hashes, not arrays - e.g. instead of:
my #params = (
'a' => '0',
'b' => '1',
);
You should use:
my %params = (
a => 0,
b => 1,
);
When passing the params to the LWP::UserAgent post method, you need to pass a reference to the hash, e.g.:
$response = $browser->post($url, \%params, %headers);
You could also look at the request you're sending to the server with:
print $response->request->as_string;
You can also use a handler to automatically dump requests and responses for debugging purposes:
$ua->add_handler("request_send", sub { shift->dump; return });
$ua->add_handler("response_done", sub { shift->dump; return });
I believe it has to do with $response = $browser->post($url, #params, #headers);
From the doc of LWP::UserAgent
$ua->post( $url, \%form )
$ua->post( $url, \#form )
$ua->post( $url, \%form, $field_name => $value, ... )
$ua->post( $url, $field_name => $value,... Content => \%form )
$ua->post( $url, $field_name => $value,... Content => \#form )
$ua->post( $url, $field_name => $value,... Content => $content )
Since your params and headers are as hashes, I would try this:
my $referer = "http://example.com/something/cgi-bin/something.cgi?module=A";
my $url = "http://example.com/something/cgi-bin/something.cgi";
my %headers = (
'User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Referer' => $referer,
'Content-Type' => 'application/x-www-form-urlencoded',
);
my %params = (
'a' => '0',
'b' => '1',
);
my $browser = LWP::UserAgent->new( );
$browser->cookie_jar({});
$response = $browser->post($url, \%params, %headers);