How can I find the full URL from a relative URL in Perl? - perl

I'm new to perl but was wondering if anyone know of a script that was similar to the following PHP version which works great!
private function resolve_href ( $base, $href ) {
if (!$href)
return $base;
$rel_parsed = parse_url($href);
if (array_key_exists('scheme', $rel_parsed))
return $href;
$base_parsed = parse_url("$base ");
if (!array_key_exists('path', $base_parsed))
$base_parsed = parse_url("$base/ ");
if ($href{0} === "/")
$path = $href;
else
$path = dirname($base_parsed['path']) . "/$href";
$path = preg_replace('~/\./~', '/', $path);
$parts = array();
foreach ( explode('/', preg_replace('~/+~', '/', $path)) as $part ) {
if ($part === "..")
array_pop($parts);
elseif ($part!="")
$parts[] = $part;
}
$dir = ( ( array_key_exists('scheme', $base_parsed)) ? $base_parsed['scheme'] . '://' . $base_parsed['host'] : "" ) . "/" . implode("/", $parts);
return str_replace( "\/", '', $dir );
}
Any help is much appreciated

See URI:
#!/usr/bin/perl
use strict; use warnings;
use URI;
my $u = URI->new_abs('../foobar', 'http://foo.com/bar/poo/');
print $u->canonical;
Output:
http://foo.com/bar/foobar

Related

Fetch and upload files and prefixed files to s3 using perl

I am trying to fetch files from s3 using Amazon::S3 module in Perl . I am successfully able to download files which are not prefixed but unable to fetch prefixed files like test/abc.txt.
I am using below code.
sub export_bucket {
my ($conn, $bucket, $directory) = #_;
$bucket = $conn->bucket($bucket);
my $response = $bucket->list();
print $response->{bucket}."\n";
for my $key (#{ $response->{keys} }) {
print "\t".$key->{key}."\n";
_export_file($conn,$bucket,$key->{key}, $directory.'/'.$key->{key});
}
}
sub _export_file {
my ($conn,$bucket,$name,$path) = #_;
print "Downloading $name file","\n";
my $test = $bucket->get_key_filename($name,'GET',$path);
print Dumper($test);
my $acl = $bucket->get_acl($name);
print Dumper($acl);
open my $acl_file, '>', $path.'.acl';
print $acl_file $acl;
close $acl_file;
}
Suggest me what changes should i make so that when a prefixed/folder comes i should be able to download the folder as well.
Thanks
You need to modify you code to create the target directory on your local filesystem if it does not already exist. It should look something like this:
use File::Path qw[make_path];
sub export_bucket {
my ( $conn, $bucket, $directory ) = #_;
$bucket = $conn->bucket($bucket);
my $response = $bucket->list();
print $response->{bucket} . "\n";
for my $key ( #{ $response->{keys} } ) {
print "\t" . $key->{key} . "\n";
_export_file( $conn, $bucket, $key->{key}, $directory . '/' . $key->{key} );
}
}
sub _export_file {
my ( $conn, $bucket, $name, $path ) = #_;
print "Downloading $name file", "\n";
my $test = $bucket->get_key_filename( $name, 'GET', $path );
print Dumper($test);
my $acl = $bucket->get_acl($name);
print Dumper($acl);
## get path directory part
my ($dir_part) = $path =~ /(.+)\/[^\/]+$/;
unless ( -d $dir_part ) {
make_path($dir_part);
}
open my $acl_file, '>', $path . '.acl';
print $acl_file $acl;
close $acl_file;
}
Perhaps $directory has a trailing "/" already in the case that is failing (or $key->{key} has a preceding slash)? Try debug-printing $directory and $path in the code to see exactly what your string arguments are.

Moving gmail messages with Net::IMAP::Simple

I am trying to use Net::IMAP::Simple to move mail to an old_messages folder after I have read and stripped the attachments from them, but when I do it all of the moved messages are blank and from "unknown sender", and the inbox is unchanged. I've checked around and nobody seems to have had this problem before.
I have also tried this using both Email::Simple and Email::MIME as the $es object passed as an argument in the statement
$imap->put( 'OLD_MESSAGES', $es, "") or warn $imap->errstr
but neither worked.
Here's my code using Email::MIME
use strict;
use warnings;
# required modules
use Net::IMAP::Simple;
use Email::MIME;
use IO::Socket::SSL;
use Email::MIME::Attachment::Stripper;
# fill in your details here
my $username = 'usersite.com';
my $password = 'password';
my $mailhost = 'imap.gmail.com';
# Connect
my $imap = Net::IMAP::Simple->new( $mailhost, port => 993, use_ssl => 1, )
|| die "Unable to connect to IMAP: $Net::IMAP::Simple::errstr\n";
# Log in
if ( !$imap->login( $username, $password ) ) {
print STDERR "Login failed: " . $imap->errstr . "\n";
exit( 64 );
}
# Look in the the INBOX
my $nm = $imap->select( 'INBOX' );
# How many messages are there?
my ( $unseen, $recent, $num_messages ) = $imap->status();
print "unseen: $unseen, recent: $recent, total: $num_messages\n\n";
my $filepath = "C:/Users/doug/Desktop/gmail/";
## Iterate through unseen messages
for ( my $i = 1 ; $i <= $nm ; $i++ ) {
if ( !$imap->seen( $i ) ) {
next;
}
else {
my $es = Email::MIME->new( join '', #{ $imap->get( $i ) } );
#my $es = Email::MIME->new( join '', #{ $imap->top($i) } );
my $text = $es->body;
my $stripper = Email::MIME::Attachment::Stripper->new( $es );
my #attachments = $stripper->attachments;
printf(
"[%03d] %s\n\t%s\n%s",
$i,
$es->header( 'From' ),
$es->header( 'Subject' ), $text
);
my $l = 0;
foreach $_ ( #attachments ) {
my $fh = IO::File->new();
binmode( $fh );
open( $fh, '>', "$filepath" . "$_->{filename}" );
print $fh "$_->{payload}\n";
$fh->close;
}
$imap->put( 'OLD_MESSAGES', $es, "" ) or warn $imap->errstr;
}
}
# Disconnect
$imap->quit;
exit;
I had to strip down my code, but below is the gist of it. Works for me before I cut it out and pasted here, lemme know if it works else i'll edit it
use Mail::IMAPClient;
use Email::MIME::Attachment::Stripper;
use other stuff as needed....
# login
my $sock = IO::Socket::SSL->new(PeerAddr=>'imap.gmail.com',PeerPort=>993);
my $imap = Mail::IMAPClient->new(Socket => $sock, User => $user, Password => $pass, Ignoresizeerrors => 1);
die $imap->LastError() unless $imap->IsAuthenticated();
# decoding mime (you can probably skip)
my $message = $imap->message_string(123);
my $decoded_mime = "";
my #decoded_list = MIME::Words::decode_mimewords($message);
for(my $i=0; $i<scalar(#decoded_list); ++$i) {
my #set = #{$decoded_list[$i]};
if(scalar(#set) == 1) {
$decoded_mime .= $set[0];
}
else {
eval {
Encode::from_to($set[0],$set[1],'utf-8');
};
if($#) { eval {}; }
$decoded_mime .= $set[0];
}
}
$message = $decoded_mime;
# Strip attachments
$mime = Email::MIME::Attachment::Stripper->new($message);
$mime = $mime->message; # convert to regular Email::MIME

Renaming files using hash table in perl

I have made a perl code which is shown below. Here what I am trying to do is first get input from a text file consisting of a HTTP URL with a Title.
thus the first regex is the title and the second regex fetches the id from inside the URL.
All these values are inserted into the hash table %myfilenames().
So this hash table has key as the URL id, and value as the Title. Everything till here works fine, now I have a set of files on my computer which have the ID in their name which we extracted from the URL.
What I want to do is that if the ID is there in the hash table, then the files name should change to the value assigned to the ID. Now the output at the print statement in the last function is correct but I am unable to rename the files. I tried many things, but nothing works. Can someone help please.
example stuff:
URL: https://abc.com/789012 <--- ID
Value (new Title) : ABC
file name on computer = file-789012 <---- ID
new file name = ABC
My code:
use File::Slurp;
use File::Copy qw(move);
open( F, '<hadoop.txt' );
$key = '';
$value = '';
%myfilenames = ();
foreach (<F>) {
if ( $_ =~ /Lecture/ ) {
$value = $_;
}
if ( $_ =~ /https/ ) {
if ( $_ =~ /\d{6}/ ) {
$key = $&;
}
}
if ( !( $value eq '' || $key eq '' ) ) {
#print "$key\t\t$value";
$myfilenames{$key} = $value;
$key = '';
$value = '';
}
}
#while ( my ( $k, $v ) = each %myfilenames ) { print "$k $v\n"; }
my #files = read_dir 'C:\\inputfolder';
for (#files) {
if ( $_ =~ /\d{6}/ ) {
$oldval = $&;
}
$newval = $myfilenames{$oldval};
chomp($newval);
print $_ , "\t\t$newval" . "\n";
$key = '';
}
You probably didn't prepend the path to the file names. The following works for me (on a Linux box):
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp qw{ read_dir };
my $dir = 0;
mkdir $dir;
open my $FH, '>', "$dir/$_" for 123456, 234567;
my $key = my $value = q();
my %myfilenames = ();
for (<DATA>) {
chomp;
$value = $_ if /Lecture/;
$key = $1 if /https/ and /(\d{6})/;
if ($value ne q() and $key ne q()) {
$myfilenames{$key} = $value;
$key = $value = q();
}
}
my #files = read_dir($dir);
for (#files) {
if (/(\d{6})/) {
my $oldval = $1;
my $newval = $myfilenames{$oldval};
rename "$dir/$oldval", "$dir/$newval";
}
}
__DATA__
Lecture A1
https://123456
# Comment
Lecture A2
https://234567

How to pass a variable in a URL using Perl?

I'm trying to pass parameters in a URL. I don't know what's missing, I tried to see how the URL looks after executing this script.
my $request3 = HTTP::Request->new(GET => $sql_activation);
my $useragent = LWP::UserAgent->new();
$useragent->timeout(10);
my $response2 = $useragent->request($request3);
if ($response2->is_success) {
my $res2 = $response2->content;
if ($res =~ m/[#](.*):(.*)[#]/g) {
my ($key, $username) = ($1, $2);
print "[+] $username:$key \n\n";
}
else {
print "[-] Error \n\n";
}
}
my $link =
"http://localhost/wordpress/wp-login.php?action=rp&key="
. $key
. "&login="
. $username;
sub post_url {
my ($link, $formref) = #_;
my $ua = new LWP::UserAgent(timeout => 300);
$ua->agent('perlproc/1.0');
my $get = $ua->post($link, $formref);
if ($get->is_success) {
print "worked \n";
}
else {
print "Failed \n";
}
}
After executing the script the URL is like this
site/wordpress/wp-login.php?action=rp&key=&login=
Perl has block level scope. You define $key and $username in the block following an if statement. They don't live beyond that.
You need to create them (with my) before that block.
# HERE
my ( $key, $username );
if ( $response2->is_success ) {
my $res2 = $response2->content;
if ( $res =~ m/[#](.*):(.*)[#]/g ) {
# Don't say my again
( $key, $username ) = ( $1, $2 );
}
else { print "[-] Error \n\n"; }
}

Why can't I fetch www.google.com with Perl's LWP::Simple?

I cant seem to get this peice of code to work:
$self->{_current_page} = $href;
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode ne "404" ) {
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
}
Will return error: get failed: http://www.google.com
The full code is as follows:
#!/usr/bin/perl
use strict;
use URI;
use URI::http;
use File::Basename;
use DBI;
use LWP::Simple;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
$ua->max_redirect(0);
package Crawler;
sub new {
my $class = shift;
my $self = {
_url => shift,
_max_link => 0,
_local => 1
};
bless $self, $class;
return $self;
}
sub trim{
my( $self, $string ) = #_;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
sub process_image {
my ($self, $process_image) = #_;
$self->{_process_image} = $process_image;
}
sub local {
my ($self, $local) = #_;
$self->{_local} = $local;
}
sub max_link {
my ($self, $max_link) = #_;
$self->{_max_link} = $max_link;
}
sub x_more {
my ($self, $x_more) = #_;
$self->{_x_more} = $x_more;
}
sub resolve_href {
my ($base, $href) = #_;
my $uri = URI->new($href);
return $uri->rel($base);
}
sub write {
my ( $self, $ref, $data ) = #_;
open FILE, '>c:/perlscripts/' . $ref . '_' . $self->{_process_image} . '.txt';
foreach( $data ) {
print FILE $self->trim($_) . "\n";
}
close( FILE );
}
sub scrape {
my #m_error_array;
my #m_href_array;
my #href_array;
my ( $self, $DBhost, $DBuser, $DBpass, $DBname ) = #_;
my ($dbh, $query, $result, $array);
my $DNS = "dbi:mysql:$DBname:$DBhost:3306";
$dbh = DBI->connect($DNS, $DBuser, $DBpass ) or die $DBI::errstr;
if( defined( $self->{_process_image} ) && ( -e 'c:/perlscripts/href_w_' . $self->{_process_image} . ".txt" ) ) {
open ERROR_W, "<c:/perlscripts/error_w_" . $self->{_process_image} . ".txt";
open M_HREF_W, "<c:/perlscripts/m_href_w_" . $self->{_process_image} . ".txt";
open HREF_W, "<c:/perlscripts/href_w_" . $self->{_process_image} . ".txt";
#m_error_array = <ERROR_W>;
#m_href_array = <M_HREF_W>;
#href_array = <HREF_W>;
close ( ERROR_W );
close ( M_HREF_W );
close ( HREF_W );
}else{
#href_array = ( $self->{_url} );
}
my $z = 0;
while( #href_array ){
if( defined( $self->{_x_more} ) && $z == $self->{_x_more} ) {
last;
}
if( defined( $self->{_process_image} ) ) {
$self->write( 'm_href_w', #m_href_array );
$self->write( 'href_w', #href_array );
$self->write( 'error_w', #m_error_array );
}
$self->{_link_count} = scalar #m_href_array;
my $href = shift( #href_array );
my $info = URI::http->new($href);
my $host = $info->host;
$host =~ s/^www\.//;
$result = $dbh->prepare("INSERT INTO `". $host ."` (URL) VALUES ('$href')");
if( ! $result->execute() ){
$result = $dbh->prepare("CREATE TABLE `" . $host . "` ( `ID` INT( 255 ) NOT NULL AUTO_INCREMENT , `URL` VARCHAR( 255 ) NOT NULL , PRIMARY KEY ( `ID` )) ENGINE = MYISAM ;");
$result->execute();
}
$self->{_current_page} = $href;
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode ne "404" ) {
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
}
#print $responseCode;
}
}
1;
#$query = "SELECT * FROM `actwebdesigns.co.uk` ORDER BY ID DESC";
#$result = $dbh->prepare($query);
#$result->execute();
#while( $array = $result->fetchrow_hashref() ) {
# print $array->{'URL'} . "\n";
#}
EDIT:
Still not working with redirect fixed.
my $redirect_limit = 10;
my $y = 0;
while( 1 && $y le $redirect_limit ) {
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode == 200 || $responseCode == 301 || $responseCode == 302 ) {
if( $responseCode == 301 || $responseCode == 302 ) {
$href = $response->header('Location');
}else{
last;
}
}else{
push( #m_error_array, $href );
last;
}
$y++;
}
if( $y ne $redirect_limit ) {
if( ! defined( $self->{_url_list} ) ) {
my #url_list = ( $href );
}else{
my #url_list = $self->{_url_list};
push( #url_list, $href );
$self->{_url_list} = #url_list;
}
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
#$result = $dbh->prepare("INSERT INTO `". $host ."` (URL) VALUES ('$href')");
#if( ! $result->execute() ){
# $result = $dbh->prepare("CREATE TABLE `" . $host . "` ( `ID` INT( 255 ) NOT NULL AUTO_INCREMENT , `URL` VARCHAR( 255 ) NOT NULL , PRIMARY KEY ( `ID` )) ENGINE = MYISAM ;");
# $result->execute();
#}
print "good";
}else{
push( #m_error_array, $href );
}
You should examine the response code to see what's happening (you're already checking for 404s). I get a 302 - a redirect.
For example:
die "get failed ($responseCode): " . $href if (!defined $content);
Resulting message:
get failed (302): http://www.google.com at goog.pl line 20.
A couple of thoughts.
1/ You seems to be using the string comparison operators (le, ne) to compare numbers. You should use the numeric comparison operators (<=, !=) instead.
2/ The value you get back from the LWP::UserAgent::get call is an HTTP::Response object. Judicious use of that class's "is_foo" method might make your code a bit cleaner.
I don't know if either of these will solve your problem. But they'll improve the quality of your code.
Here's your problem:
my $content = LWP::Simple->get($href);
That passes the string "LWP::Simple" as the first argument to 'get'. You want:
my $content = LWP::Simple::get($href);
Check your SELinux settings.
SELINUX enabled systems will not allow an outgoing connection from a web agent (httpd).
This page can tell you more about SELinux and HTTPD settings:
http://wiki.centos.org/TipsAndTricks/SelinuxBooleans
Enable outbound web connections from Apache in a Perl script:
# setsebool -P httpd_can_network_connect on