How to tell if a webpage exists? - perl

Just for fun, I'm writing a Perl program to check if a given website exists. For my purposes, a website exists if I can go into my browser, punch in the url and get a meaningful webpage (meaning not an error or "failed to open page" message). What would be the best way to go about doing this? Eventually I would like to be able to give my program a list of hundreds of urls.
I'm thinking about just pinging each of the urls on my list to see if they exist; however, I don't really know too much about networking so is this the best way to do it?

Using Library for WWW in Perl (LWP):
#!/usr/bin/perl
use LWP::Simple;
my $url = 'http://www.mytestsite.com/';
if (head($url)) {
print "Page exists\n";
} else {
print "Page does not exist\n";;
}

There is no such protocol as "pinging web pages" for existence. You actually have to request the resource and if it's served up, it exists. There are several ways to go about it, here are a couple:
Retrieving web pages with LWP
Checking for an existing web page could as simple as:
#!/usr/bin/env perl
use strict;
use warnings;
use LWP::Simple qw(head);
head('http://www.perlmeme.org') or die 'Unable to get page';
The same solution as command-line tool is lwp-request/HEAD. HEAD returns the resource headers, such as content size and will be quicker than getting all the page contents.

Related

How to get argument when POSTed to URL using Perl (Mojolicious)

I am setting up duo two-factor authentication in a perl web application (using mojolicious). I am new to perl so this may be a simple answer.
I have everything set up except I need to verify the response by doing the following:
"After the user authenticates (e.g. via phone call, SMS passcode, etc.) the IFRAME will generate a signed response called sig_response and POST it back to the post_action URL. Your server-side code should then call verify_response() to verify that the signed response is legitimate."
In perl, how can you call for sig_response, is there a module? Below is in example using python:
sig_response = self.get_argument("sig_response") # for example (if using Tornado: http://www.tornadoweb.org/en/stable/documentation.html)
Duo Web: https://duo.com/docs/duoweb
It looks like this sig_response is just a value that's POSTed to your response handler. When you created the URL to show in the iframe, it had a post_action parameter. That's the endpoint in your application that handles the user coming back from the iframe.
You need to build a route in Mojo for that. There, you need to look at the parameters you are receiving in the POST body. I don't know if it's form-data or something else, like JSON. It doesn't really say in the documentation. I suggest you dump the parameters, and if that doesn't show it, dump the whole request body.
Once you have that sig_response parameter, you need to call the verify_response function that duo's library provides and look at the return value.
If you have not done it yet, get the SDK at https://github.com/duosecurity/duo_perl. It's not a full distribution. You can either clone the whole thing or just download the pm file from their github and put it in the lib directory of your application. Then use it like you would any other module. Since it doesn't export anything, you need to use the fully qualified name to call the verify_response function.
The whole thing might look something like this untested code:
post '/duo_handler' => sub {
my $c = shift;
my $sig_request = $c->param('sig_response');
my $user = DuoWeb::verify_response($ikey, $skey, $akey, $sig_request);
if ($user) {
# logged in
} else {
# not logged in
}
};
Disclaimer: I don't know this service. I have only quickly read the documentation you linked, and taken a look at their Perl SDK, which they should really put on CPAN.

Perl Redirect CGI File To A "file://" Address

I am a newbie Perl programmer so please go easy on me. Is there any way to remotely redirect a page from calling a perl .cgi file to another page file? The user will click a link which leads to a .cgi file. The .cgi file then will redirect the user to a "file://" location. The purpose is to download a file in another server after logging for click count.
The code below works for "http://" addresses, but how do I get it to work with "file://" addresses?
#!C:\Strawberry\perl\bin
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
my $page = new CGI;
# print header and start the markup output
print "HTTP/1.0 200 OK\n";
print $page->header( "text/html" );
print $page->start_html( "CGI Environment" );
print "Hello world! ";
print "Press the Enter key to exit.\n";
my $url="file://location_of_file";
my $t=1; # time until redirect activates
print "<META HTTP-EQUIV=refresh CONTENT=\"$t;URL=$url\">\n";
print $page->end_html;
# end code
Is this a good approach? Is there a better approach? Please help. Thank you in advanced!
This is not permitted for security reasons. All browsers (that I know of) will refuse a meta redirect from an http:// URL to a file:// URL. The same restriction applies to 30x server redirects. See the Redirect Restrictions section of the Browser Security Handbook for more information.
Probably the most useful way would be by using '.url' files (static or dynamically generated), as explained nicely in JFish222's post
Otherwise, I'm afraid you'd be stuck with elevating privileges (strongly browser dependent), or playing with browser extensions etc..

how to use onclick with href on cgi perl

I do have a list of files on page and next each file there is a link says delete, simply user by clicking the delete link it passes the file name on to the function on the same script then it deletes the file from server and it says on the same page, any idea?
#some other stuff goes here such list of files
print "<TD><a onclick='deleteFile()' href='#'>delete</a> </td>";
sub deleteFile()
{
unlink ($file);
}
I also tried pure cgi perl and when I click delete link it prints error "Internal Error" but when I look for the file to see if it has been delete or not then the file actually deleted so there is no permission issue here else it wouldn't delete or unlink the file, here is what changed to:
print "<a href='../cgi-bin/deleteFile.cgi?param1=$dir&param2=$file'>delete</a>";
here what I have in deleteFile.cgi I get both param1 & 2 and use unlike like below
unlink($location);
You really haven't tried hard enough to find your own solution here. I will give you some pointers ...
The onclick attribute in the HTML will trigger Javascript to be run in the browser (there are better ways to make a click event run Javascript code).
None of the Perl code in your CGI script will run unless the browser sends a request to the CGI script on the server. Things that could generate a request include:
the user clicking a link with an href that points to the CGI script (perhaps with the file pathname in a querystring parameter)
the user clicking a submit button in a form with an action that points to the CGI script (perhaps with the file pathname in a hidden form field)
some Javascript code in the browser that issues an AJAX request to the CGI script (with the file pathname as a POST parameter)
Clicking a link would result in a GET request - it is generally considered bad practice to run code that changes the state of the server (e.g.: deleting a file) is response to a GET request.
A form submission or an AJAX request can cause a POST request. You could even explicitly use a DELETE request via AJAX. These are more appropriate request methods to use for mutating server state.
Even when you get your code working, it will only be able to delete files in directories that the web server has write access to. Web servers are not generally configured with write access to any directories by default.
The problem was after deleting there was no redirect so after adding redirect page then it worked like a charm..
unlink glob ($file);
print redirect(-url=>'http://main.cgi');
thanks

Replace authenticated user agent login/ page scrape using Perl and Mojolicious

I am trying to port some old web scraping scripts written using older Perl modules to work using only Mojolicious.
Have written a few basic scripts with Mojo but am puzzled on an authenticated login which uses a secure login site and how this should be handled with a Mojo::UserAgent script. Unfortunately the only example I can see in the documentation is for basic authentication without forms.
The Perl script I am trying to convert to work with Mojo:UserAgent is as follows:
#!/usr/bin/perl
use LWP;
use LWP::Simple;
use LWP::Debug qw(+);
use LWP::Protocol::https;
use WWW::Mechanize;
use HTTP::Cookies;
# login first before navigating to pages
# Create our automated browser and set up to handle cookies
my $agent = WWW::Mechanize->new();
$agent->cookie_jar(HTTP::Cookies->new());
$agent->agent_alias( 'Windows IE 6' ); #tell the website who we are (old!)
# get login page
$agent->get("https://reg.mysite.com")
$agent->success or die $agent->response->status_line;
# complete the user name and password form
$agent->form_number (1);
$agent->field (username => "user1");
$agent->field (password => "pass1");
$agent->click();
#try to get member's only content page from main site on basis we are now "logged in"
$agent->get("http://www.mysite.com/memberpagesonly1");
$agent->success or die $agent->response->status_line;
$member_page = $agent->content();
print "$member_page\n";
So the above works fine. How to convert to do the same job in Mojolicious?
Mojolicious is a web application framework. While Mojo::UserAgent works well as a low-level HTTP user agent, and provides facilities that are unavailble from LWP (in particular native support for asynchronous requests and IPV6) neither are as convenient to use as as WWW::Mechanize for web scraping.
WWW::Mechanize subclasses LWP::UserAgent to interface with the internet, and uses HTML::Form to process the forms it finds. Mojo::UserAgent has no facility for processing HTML forms, and so building the corresponding HTTP requests is not at all straighforward. Information such as the HTTP method used (GET or POST) the names of the form fields, and the insertion of default values for hidden fields are all done automatically by HTML::Form and are left to the programmer if you restrict yourself to Mojo::UserAgent.
It seems to me that even trying to use Mojo::UserAgent in combination with HTML::Form is poblematic, as the former requires a Mojo::Transaction::HTTP object to represent the submission of a filled-in form, whereas the latter generates HTTP::Request objects for use with LWP.
In short, unless you are willing to largely rewrite WWW::Mechanize, I think there is no way to reimplement your software using Mojolicious modules.
You can use WWW::Mechanize to talk to web servers, and you can use Mojo::DOM to benefit from Mojolicious' as a parser. The best of two worlds... :)

Getting error in accessing a link using WWW::Mechanize

Getting the following error in a JavaScript link using perl - WWW::Mechanize.
Error GETing javascript:submt_os('2','contact%20info','contact%20info'):Protocol scheme 'javascript' is not supported
This is my code:
#!/usr/bin/perl
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$uri="http://tinyurl.com/76xv4ld";
$mech->get($uri);
# error on this link
$mech->follow_link( text => 'Contact Information');
print $mech->content();
Once I get the page, I want to click Contact Information.
Is there any other way to click Contact Information?
You can't follow a javascript link with WWW::Mechanize. Even if you had a javascript interpreter you'd need complete DOM support for anything non-trivial.
So - you need to script a web-browser. I use Selenium in my testing, which is quite bulky and requires java. You might want to investigate WWW::Mechanize::Firefox. I've not used it but it does provide a mechanize style interface to Firefox.