wget not working properly inside a Perl program - perl

I am trying to download some xml files from a given URL. Below is the code which I have used for the same-
use strict;
use warnings;
my $url ='https://givenurl.com/';
my $username ='scott';
my $password='tiger';
system("wget --user=$username --password=$password $url") == 0 or die "system execution failed ($?): $!";
local $/ = undef;
open(FILE, "<index.html") or die "not able to open $!";
my $index = <FILE>;
my #childs = map /<a\s+href\=\"(AAA.*\.xml)\">/g , $index;
for my $xml (#childs)
{
system("wget --user=$username --password=$password $url/$xml");
}
But when I am running this, it gets stuck in the for-loop wget command. It seems wget is not able to fetch the files properly? Any clue or suggestion?
Thank you.
Man

You shouldn't use an external command in the first place.
Ensure that WWW::Mechanize is available, then use code like:
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
...
$mech->credentials($username, $password);
$mech->get($url);
foreach my $link ($mech->find_all_links(url_regex=>qr/\bAAA/)) {
$mech->get($link);
...
}

If $url or $xml contains any shell metacharacters (? and & are common ones in URLs) then you may need to either quote them properly
system("wget --user=$username --password=$password '$url/$xml'");
system qq(wget --user=$username --password=$password "$url/$xml");
or use the LIST form of system that bypasses the shell
system( 'wget', "--user=$username", "--password=$password", "$url/$xml");
to get the command to work properly.

maybe it's because the path to wget, what if you use:
system("/usr/bin/wget --user=$username --password=$password $url")
or I guess it can be a problem with variables passed to system: ($username, $password, $url)

Related

What's wrong with my IF statement and LWP::Simple?

I am trying to create a simple scraper, and I am using getstore(), but the scirpt won't create the .txt file when used within an IF statement. What am I doing wrong there?
Thanks,
Carlos N.
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my $url;
my $content;
print "Enter URL:";
chomp($url = <STDIN>);
$content = get($url);
if ($content =~ s%<(style|script)[^<>]*>.*?</\1>|</?[a-z][a-z0-9]*[^<>]*>|<!--.*?-->%%g) {
$content = getstore($content,"../crawled_text.txt");
}
die "Couldn't get $url" unless defined $content;
From the LWP::Simple documentation:
my $code = getstore($url, $file)
Gets a document identified by a URL and stores it in the file. The
return value is the HTTP response code.
Your first parameter is a stripped HTML file and likely not a URL. You could use a debugger or print statements in your code to understand more about the contents of your variables and about whether your program goes into an if block.
getstore takes an URL as a parameter and stores it into a file. What you want to do is to just store content in a file, so use this instead
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use Path::Tiny;
my $url = shift || "https://perl.org";
my $content = get($url) or die "Couldn't get $url" ;
if ($content =~ s%<(style|script)[^<>]*>.*?</\1>|</?[a-z][a-z0-9]*[^<>]*>|<!--.*?-->%%g) {
my $crawled_text = path("../crawled_text.txt");
$crawled_text->spew_utf8($content)
}
I have made also some small style changes and Path::Tiny to save content to a file. You can use the default open and print (or say) if you prefer to do so. Using shift allows also to take the URL as an argument from the command line, which is more idiomatic than prompting the user for it.

Uninitialized value in concatenation

I have the "uninitialized value in concatenation" error that is thoroughly discussed in this forum, and generally refers to an undefined variable.
However, as a newbie, I'm short on "why" the problem exists in the code below.
The error refers to the variables $sb and $filesize.
Any insight is greatly appreciated.
Thank you!!!
#!/usr/bin/perl
use strict;
use warnings;
use File::stat;
#The directory where you store the filings
my $dir="/Volumes/EDGAR1/Edgar/Edgar2/10K/2009";
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# Use a regular expression to ignore files beginning with a period
next if ($file =~ m/^\./);
#my $form_type=substr($line,62,12);
#my $cik=substr($line,74,10);
#my $file_date=substr($line,86,10);
#Note that for file date, we need to get rid of
#the - with the following regular expression.
#month-day-year and some years there is not.
#This regular expression
#my $file_date=~s/\-//g;
my $filesize = -s "$file";
my $sb = (stat($file))[7];
print "$file,$sb,$filesize\n";
}
closedir(DIR);
exit 0;
You are using the File::stat module. This module implements a stat functionality that overrides Perl's built-in. And it returns an object instead of a list. So this:
my $sb = (stat($file))[7];
Causes $sb to be undefined, because there is only 1 object in the list. What you do is use the modules functions instead:
my $sb = stat($file)->size();

Diff two remote files using Perl

I have an array of file paths:
#files = ('/home/.../file.txt', '/home/.../file2.txt',...);
I have multiple remote machines, with a similar filestructure. How can I diff these remote files using Perl?
I thought of using Perl backticks, ssh and using diff, but I am having issues with sh (it doesn't like diff <() <()).
Is there a good Perl way of comparing at least two remote files?
Use rsync to copy the remote files to the local machine, then use diff to find out the differences:
use Net::OpenSSH;
my $ssh1 = Net::OpenSSH->new($host1);
$ssh1->rsync_get($file, 'master');
my $ssh2 = Net::OpenSSH->new($host2);
system('cp -R master remote');
$ssh2->rsync_get($file, 'remote');
system('diff -u master remote');
You can use the Perl Module on CPAN called Net::SSH::Perl to run remote commands.
Link: http://metacpan.org/pod/Net::SSH::Perl
Example from the Synopsis:
use Net::SSH::Perl;
my $ssh = Net::SSH::Perl->new($host);
$ssh->login($user, $pass);
my($stdout, $stderr, $exit) = $ssh->cmd($cmd);
You command would look something like
my $cmd = "diff /home/.../file.txt /home/.../file2.txt";
edit: The files are on different servers.
You can still use Net::SSH::Perl to read the files.
#!/bin/perl
use strict;
use warnings;
use Net::SSH::Perl;
my $host = "First_host_name";
my $user = "First_user_name";
my $pass = "First_password";
my $cmd1 = "cat /home/.../file1";
my $ssh = Net::SSH::Perl->new($host);
$ssh->login($user, $pass);
my($stdout1, $stderr1, $exit1) = $ssh->cmd($cmd1);
#now stdout1 has the contents of the first file
$host = "Second_host_name";
$user = "Second_user_name";
$pass = "Second_password";
my $cmd2 = "cat /home/.../file2";
$ssh = Net::SSH::Perl->new($host);
$ssh->login($user, $pass);
my($stdout2, $stderr2, $exit2) = $ssh->cmd($cmd2);
#now stdout2 has the contents of the second file
#write the contents to local files to diff
open(my $fh1, '>', "./temp_file1") or DIE "Failed to open file 1";
print $fh1 $stdout1;
close $fh1;
open(my $fh2, '>', "./temp_file2") or DIE "Failed to open file 2";
print $fh2 $stdout2;
close $fh2;
my $difference = `diff ./temp_file1 ./temp_file2`;
print $difference . "\n";
I haven't tested this code, but you could do something like this. Remember to download the Perl Module Net::SSH::Perl to run remote commands.
Diff is not implemented in the Perl Core Modules, but there another called Text::Diff on CPAN so maybe that would work too. Hope this helps!

Perl script for Downloading the file from web

I am trying to automate one of my task where i have to download a last 5 releases of some softwares let say Google talk from http://www.filehippo.com/download_google_talk/.
I have never done such type of programming i mean, to interact with Web through perl .I have just read and came to know that through CGI module we can implement this thing so i tried with this module.
If some body can give me better advice then please you are welcome :)
My code :
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $q = CGI->new;
my $file = $q->param('file') or error('Error: No file selected.');
print "$file\n";
if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
$file = $1;
}
else {
error('Error: Unexpected characters in filename.');
}
if ($file) {
download($file) or error('Error: an unknown error has occured. Try again.');
}
sub download
{
open(DLFILE, '<', "$path_to_files/$file") or return(0);
print $q->header(-type => 'application/x-download',
-attachment => $file,
'Content-length' => -s "$path_to_files/$file",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
return(1);
}
sub error {
print $q->header(),
$q->start_html(-title=>'Error'),
$q->h1($_[0]),
$q->end_html;
exit(0);
}
In above code i am trying to print the file name which i wan to download but it is displaying error message.I am not able to figure it out why this error "Error: No file selected." is comming.
Sorry, but you are in the wrong track. Your best bet is this module: http://metacpan.org/pod/WWW::Mechanize
This page contain a lot of example to start with: http://metacpan.org/pod/WWW::Mechanize::Examples
It could be more elegant but I think this code easier to understand.
use strict;
use warnings;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $mech = WWW::Mechanize->new();
$mech->get( $path_to_files );
$mech->save_content( "download_google_talk.html" );#save the base to see how it looks like
foreach my $link ( $mech->links() ){ #walk all links
print "link: $link\n";
if ($link =~ m!what_you_want!i){ #if it match
my $fname = $link;
$fname =~ s!\A.*/!! if $link =~ m!/!;
$fname .= ".zip"; #add extension
print "Download $link to $fname\n";
$mech->get($link,":content_file" => "$fname" );#download the file and stoore it in a fname.
}
}

Open remote file via http

Is there any perl module like File::Remote, that works over http (read only)? Something like
$magic_module->open( SCRAPE, "http://somesite.com/");
while(<SCRAPE>)
{
#do something
}
Yes, of course. You can use LWP::Simple:
use LWP::Simple;
my $content = get $url;
Don't forget to check if the content is not empty:
die "Can't download $url" unless defined $content;
$content will be undef it some error occurred during downloading.
Also you can use File::Fetch module:
File::Fetch
->new(uri => 'http://google.com/robots.txt')
->fetch(to => \(my $file));
say($file);
With HTTP::Tiny:
use HTTP::Tiny qw();
my $response = HTTP::Tiny->new->get('http://example.com/');
if ($response->{success}) {
print $response->{content};
}
If you want unified interface to handle both local, remote (HTTP/FTP) and whatever else files, use IO::All module.
use IO::All;
# reading local
my $handle = io("file.txt");
while(defined(my $line = $handle->getline)){
print $line
}
# reading remote
$handle = io("http://google.com");
while(defined(my $line = $handle->getline)){
print $line
}