POSTing to Google Sheets with Perl LWP - perl

I would like to add worksheets to an existing Google spreadsheet, but am not getting very far. The below does not work for me. Is the below POST request incorrect?
note: My worksheet is indeed public and published on the web. This is confirmed due to the ability to successfully GET request.
link to Google documentation
use strict;
use warnings;
use feature 'say';
use LWP::UserAgent;
my $agent = LWP::UserAgent->new;
my $key = "some key";
my $url = "https://spreadsheets.google.com/feeds/worksheets/$key/public/full"
my $xml = join "\n",
'<entry xmlns="http://www.w3.org/2005/Atom"',
'xmlns:gs="http://schemas.google.com/spreadsheets/2006">',
'<title>Expenses</title>',
'<gs:rowCount>50</gs:rowCount>',
'<gs:colCount>10</gs:colCount>',
'</entry>';
my $response = $agent->post(
$url,
'Content-Type' => 'application/atom+xml',
'Content' => $xml
);
$response->is_success && say "OK";
$response->is_error && say "error";

From this documentation link to Google docs https://spreadsheets.google.com/feeds/worksheets/key/private/full - is correct url .
Please try change url from https://spreadsheets.google.com/feeds/worksheets/$key/public/full to https://spreadsheets.google.com/feeds/worksheets/key/private/full

Related

HTML::TableExtract an HTTPS site

I've created a perl script to use HTML::TableExtract to scrape data from tables on a site.
It works great to dump out table data for unsecured sites (i.e. HTTP site), but when I try HTTPS sites, it doesn't work (the tables_report line just prints blank.. it should print a bunch of table data).
However, if I take the content of that HTTPS page, and save it to an html file and then post it on an unsecured HTTP site (and change my content to point to this HTTP page), this script works as expected.
Anyone know how I can get this to work over HTTPS?
#!/usr/bin/perl
use lib qw( ..);
use HTML::TableExtract;
use LWP::Simple;
use Data::Dumper;
# DOESN'T work:
my $content = get("https://datatables.net/");
# DOES work:
# my $content = get("http://www.w3schools.com/html/html_tables.asp");
my $te = HTML::TableExtract->new();
$te->parse($content);
print $te->tables_report(show_content=>1);
print "\n";
print "End\n";
The sites mentioned above for $content are just examples.. these aren't really the sites I'm extracting, but they work just like the site I'm really trying to scrape.
One option I guess is for me to use perl to download the page locally first and extract from there, but I'd rather not, if there's an easier way to do this (anyone that helps, please don't spend any crazy amount of time coming up with a complicated solution!).
The problem is related to the user agent that LWP::Simple uses, which is stopped at that site. Use LWP::UserAgent and set an allowed user agent, like this:
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $url = 'https://datatables.net/';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success) {
# ok -> I simply print the content in this example, you should parse it
print $res->decoded_content;
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
This is because datatables.net is blocking LWP::Simple requests. You can confirm this by using below code:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
print is_success(getprint("https://datatables.net/"));
Output:
$ perl test.pl
403 Forbidden <URL:https://datatables.net/>
You could try using LWP::RobotUA. Below code works fine for me.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::TableExtract;
my $ua = LWP::RobotUA->new( 'bot_chankey/1.1', 'chankeypathak#stackoverflow.com' );
$ua->delay(5/60); # 5 second delay between requests
my $response = $ua->get('https://datatables.net/');
if ( $response->is_success ) {
my $te = HTML::TableExtract->new();
$te->parse($response->content);
print $te->tables_report(show_content=>1);
}
else {
die $response->status_line;
}
In the end, a combination of Miguel and Chankey's responses provided my solution. Miguel's made up most of my code, so I selected that as the answer, but here is my "final" code (got a lot more to do, but this is all I couldn't figure out.. the rest should be no problem).
I couldn't quite get either mentioned by Miguel/Chankey to work, but they got me 99% of the way.. then I just had to figure out how to get around the error "certificate verify failed". I found that answer with Miguel's method right away, so in the end, I mostly used his code, but both responses were great!
#!/usr/bin/perl
use lib qw( ..);
use strict;
use warnings;
use LWP::UserAgent;
use HTML::TableExtract;
use LWP::RobotUA;
use Data::Dumper;
my $ua = LWP::UserAgent->new(
ssl_opts => { SSL_verify_mode => 'SSL_VERIFY_PEER' },
);
my $url = 'https://WebsiteIUsedWasSomethingElse.com';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success)
{
my $te = HTML::TableExtract->new();
$te->parse($res->content);
print $te->tables_report(show_content=>1);
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
my $url = "https://ohsesfire01.summit.network/reports/slices";
my $user = 'xxxxxx';
my $pass = 'xxxxxx';
my $ua = new LWP::UserAgent;
my $request = new HTTP::Request GET=> $url;
# authenticate
$request->authorization_basic($user, $pass);
my $page = $ua->request($request);

Perl print the redirected url

I want to print the redirected url in perl.
Input url : http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv
output url : http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->request;
How to get this done in perl?
You need to examine the HTTP response to find the URL. The documentation of HTTP::Response gives full details of how to do this, but to summarise, you should do the following:
use strict;
use warnings;
use feature ':5.10'; # enables "say"
use LWP::UserAgent;
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
# you should add a check to ensure the response was actually successful:
if (! $res->is_success) {
say "GET failed! " . $res->status_line;
}
# show the base URI for the response:
say "Base URI: " . $res->base;
You can view redirects using HTTP::Response's redirects method:
if ($res->redirects) { # are there any redirects?
my #redirects = $res->redirects;
say join(", ", #redirects);
}
else {
say "No redirects.";
}
In this case, the base URI is the same as $url, and if you examine the contents of the page, you can see why.
# print out the contents of the response:
say $res->decoded_contents;
Right near the bottom of the page, there is the following code:
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
The redirect is handled by javascript, and so is not picked up by LWP::UserAgent. If you want to get this URL, you will need to extract it from the response contents (or use a different client that supports javascript).
On a different note, your script starts off like this:
use LWP::UserAgent qw();
The code following the module name, qw(), is used to import particular subroutines into your script so that you can use them by name (instead of having to refer to the module name and the subroutine name). If the qw() is empty, it's not doing anything, so you can just omit it.
To have LWP::UserAgent follow redirects, just set the max_redirects option:
use strict;
use warnings;
use LWP::UserAgent qw();
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new( max_redirect => 5 );
my $res = $ua->get($url);
if ( $res->is_success ) {
print $res->decoded_content; # or whatever
} else {
die $res->status_line;
}
However, that website is using a JavaScript redirect.
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
This will not work unless you use a framework that enables JavaScript, like WWW::Mechanize::Firefox.
It will throw you an error for the last line $res - > request since it is returning hash and content from the response. So below is the code:
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->content;

Parse data from freebase with perl

I have problem to parse data with my crawler I'm writting in perl from freebase.
I'm trying to pull out data from this URL:
(example)
http://www.freebase.com/authority/imdb/title?ns&lang=en&filter=%2Ftype%2Fnamespace%2Fkeys&timestamp=2013-11-20&timestamp=2013-11-21
It is page with IMDB_ID's and MID's. I'm trying to extract links. Problem is I have only 100 results and when I reach 'bottom of page' in Mozilla Firefox I get more results (11 more). I'm using LWP::UserAgent.
Anybody knows solution with some sample code, how to automatically pull out all 111 links of MID's from this page.
Here is my code:
#!/usr/bin/perl
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;
$URL = 'http://www.freebase.com/authority/imdb/title?ns&lang=en& filter=%2Ftype%2Fnamespace%2Fkeys&timestamp=2013-11-20&timestamp=2013-11-21'; #URL
$browser = LWP::UserAgent->new();
$browser->timeout(10);
my $request = HTTP::Request->new(GET => $URL);
my $response = $browser->request($request);
if ($response->is_error()) {printf "%s\n", $response->status_line;}
$contents = $response->content();
my ($page_parser) = HTML::LinkExtor->new(undef, $URL);
$page_parser->parse($contents)->eof;
#links = $page_parser->links;
foreach $link (#links) {
$_ = $$link[2];
# if (index($$link[2], $_) != -1) {
$mid = $$link[2];# if m/http:\/\/www\.freebase\.com\/m\//;
#$mid =~ s/\?links=//;
#$mid =~ s/http:\/\/www.freebase.com\///;
print "MID $mid\n";
}
Crawling freebase.com will likely get you blocked. As was mentioned in the comments, Freebase offers both a RESTful JSON API for light/medium duty use or interactive queries and a bulk download of the entire database for heavy consumers.

perl - Facebook Graph - Post to Fanpage

this is not giving me any errors but yet it is not posting to my fanpage wall.
would love some help in this! trying to figure this out on my own has been a roller coaster.
tested with the correct token of course.
#!/usr/bin/perl
use strict;
use warnings;
use open qw(:std :utf8);
use LWP::Simple;
use YAML::Tiny;
use JSON;
use URI;
use utf8;
my $access_token = 'blah';
my $profile_id = '200117706712975';
#Publish to a facebook page as admin
graph_api('/' . $profile_id . '/feed',{
access_token => $access_token,
message => 'this is a test!',
link => 'http://test.com',
method => 'post'
});
exit 0;
sub graph_api {
my $uri = new URI('https://graph.facebook.com/' . shift);
$uri->query_form(shift);
my $resp = get("$uri");
return defined $resp ? decode_json($resp) : undef;
}
If you dump the response object - $resp - using Data::Dumper is similar, you will get the reason for the post not appearing. The response object is JSON.

Problems when downloading .csv files from Google insight for search using Perl

I am trying to use perl to download .csv files from google insights for search. But I meet with two problems:
It seems that the download URL is a redirect one, so I cannot download it with LWP module.
The url is
"http://www.google.com/insights/search/overviewReport?q=dizzy&date=1%2F2012%205m&cmpt=date&content=1&export=1". You may try it, probably should login first.
It seems that I have to store the session before downloading. Without doing this, I will get a warn - like "reach the quota limit".
How can I download this .csv file automatically using PERL? Thanks for the help.
Here is my code:
#create userAgent object
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
#create a request
my $req = HTTP::Request->new(GET => 'http://www.google.com/insights/search/overviewReport?q=dizzy&date=1%2F2012%205m&cmpt=date&content=1&export=1');
my $res = $ua->request($req);
#check the outcome of the response
if($res->is_success) {
print $res->content;
}
else {
print $res->status_line, "\n";
}
I'm highly recommend you to use WWW::Mechanize for web automation (this is high level LWP::UserAgent):
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->agent_alias("Windows IE 6");
$mech->get("https://accounts.google.com/serviceLogin");
$mech->submit_form(
form_id => "gaia_loginform",
fields => {
Email => 'gangabass#gmail.com',
Passwd => 'password',
},
button => "signIn",
);
$mech->get("http://www.google.com/insights/search/overviewReport?q=dizzy&date=1%2F2012%205m&cmpt=date&content=1&export=1");
open my $fh, ">:encoding(utf8)", "report.csv" or die $!;
print {$fh} $mech->content();
close $fh;