how to download `decoded_content` - perl

***UPDATED CODE with resume functionality**
my $ua = LWP::UserAgent->new;
$ua->credentials('$ip:80', 'Realm', 'username', 'password');
my $response = $ua->mirror($url,$newfile);
if ($response->is_success) {
print "Download Successfull.";
}
else {
print "Error: " . $response->status_line;
}
********OLD CODE*****************
my $ua = LWP::UserAgent->new;
$ua->credentials('$ip:80', 'Realm', 'username', 'password');
my $response = $ua->get($url);
if ($response->is_success) {
print "Retrieved " .length($response->decoded_content) .
" bytes of data.";
}
else {
print "Error: " . $response->status_line;
}
open my $fh, '>encoding(UTF-8)', $tmp;
print {$fh} $response->decoded_content;
close $fh;
if ( -e $tmp ) {
my $filesize = ( stat $tmp )[9];
my $origsize = $queue[$rec][1];
if ( $filesize < $origsize) {
print "Resuming download";
******************************************
code for resuming the partly downloaded file...
*******************************************
}
else {
print "File downloaded correctly\n";
}
}
As i'm newbie to perl, could download decoded_content, though some errors persists.
Need to resume the file download, if we have a partial file.
This was the code i've tried, but am not able to know where to start with, hence any quick thoughts in this regard will be of great help indeed. Please help on this.

See method mirror in LWP::UserAgent. Documentation quote:
This method will get the document identified by $url and store it in file called $filename.
my $response = $ua->mirror($url, $filename); # no single quotes around variables!
See the source code for mirror, it deals correctly with truncated/partially downloaded files.

Related

cloudflare protection is blocking the website connectivity in mojo::user-agent - perl

I just want to validate the website link whether it is connecting or not. I added the website also in the code. Please show some light on this.
Here is my code:
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $timeout = $ua->request_timeout;
$ua = $ua->request_timeout(10);
my $res = $ua->get('https://www.aba.com')->result;
if ($res->is_success) { print 'Success' }
elsif ($res->is_error) { print 'Failed ' . $res->message }
elsif ($res->code == 301) { print 'Redirect Success ' . $res->headers->location }
else { print 'Manual Check Required URL...' }
The above code is giving the following failed message:
Failed Service Temporarily Unavailable
Anyhow, I resolved the failed issue, it may be useful to someone else:
use Mojo::Promise;
use Mojo::UserAgent;
my #urls = ('https://www.aba.com');
my $ua = Mojo::UserAgent->new;
my $res;
my #gets = map {
my $url = $_;
$res = $ua->get_p( $url )->then(
sub { print "Success -- valid -- $url" },
sub { print "Failed -- $url" },
);
} #urls;
Mojo::Promise->all( #gets )->wait;
The result is:
Success -- valid -- https://www.aba.com

program is terminated after else condition in perl

After encountering the else condition(invalid url) loop is terminated and not processing further urls. 2. even if the node fails in xpath it is not printed in screen or file.I want to print that in both file and screen (node exception)
use LWP::Simple;
use File::Compare;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
use Win32::Console::ANSI;
use Term::ANSIColor;
sub crawl_content{
{
open(FILE, "C:/Users/jeyakuma/Desktop/input/input.txt");
{
while(<FILE>){
chomp;
$url=$_;
foreach ($url){
$domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;
}
do 'C:/Users/jeyakuma/Desktop/perl/mainsub.pl';
&domain_check();
my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
my $req = HTTP::Request->new( GET => "$url" );
my $res = $ua->request($req);
if ( $res->is_success ){
print "working on $domain\n";
binmode ":utf8";
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
my #node = $xp->findnodes_as_string("$xpath") or print "couldn't find the node\n" ;
open HTML, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/ project/data_$date/$site.html";
foreach(<#node>){
print HTML #node;
close HTML ;
}
}
else{
print color("green"), "$domain Invalid url\n", color("reset") and open FILE,">C:/Users/jeyakuma/Desktop/log.txt"; print FILE " $domain Invalid URL";
}
}
}
}
}
do 'C:/Users/jeyakuma/Desktop/perl/comparefinal.pl';
compare_result();
}
The else condition reopens FILE for writing to another file. Thus, at the next iteration of the while (<FILE>) loop, Perl will attempt to read from FILE and fail (because it's now only available for writing, not reading), and the loop will end. You need to use a name other than FILE in the else condition.

Perl sftp downloads with Net::SFTP::Foreign

Im a beginner. I have written a perl script which does the following
-Create a directory under “/x01/abcd/abc_logs/abcd_Logs” by the current date, in the format of “YYYYMMDD” if it has not already been created.
i.e: if the script is run on “01st of jan 2013”, the directory “20130101” will be created under the said path. So whenever there is a need to inspect the logs always look for a directory by the current date.
-Check if the log file(s) have already been downloaded earlier within the same day, and if not log(s) will be downloaded to the TODAY’s directory.
Im having a hard time, coming up with a solution to print a message when there are no files in the share. This is of course when the user specify 2 or more files that are not there in the share. I know that this happens because there is a "die" statement in the "sub get_LOGS". I just cannot seem to understand how to return a message when all the files I specify do not happen to be in the share.
usage of this script is as follows
./abc_logs ....<file(n)>
following is the script.
my $LOGS_LOCAL_PATH = "/x02/abc/abcba2/";
chomp $LOGS_LOCAL_PATH;
my $LOGS_REM_PATH = "/x01/INT/abc/vabc2/";
chomp $LOGS_REM_PATH;
my $TODAY = `date +%Y%m%d`;
chomp $TODAY;
my #GETLOOP = #ARGV;
unless ($#ARGV >= 0) {
print "\nUsage: gtp_logs.pl <file1> <file2> <file3>.....<file(n)>\n\n";
exit;
}
system("clear");
unless ( -d "$LOGS_LOCAL_PATH"."$TODAY") {
print "Directory \"$TODAY\" doesn't exist. So creating the directory..!\n";
print "OK..Done.....!\n\n";
system("mkdir $LOGS_LOCAL_PATH/$TODAY");
}
else {
print "Directory already exists. Logs will be downloaded to ==> \"$LOGS_LOCAL_PATH$TODAY\".....!\n\n";
}
# if_DOWNLOADED($LOGS_LOCAL_PATH,$TODAY,#GETLOOP);
chdir("$LOGS_LOCAL_PATH"."$TODAY") || die "cannot cd to ($!)";
foreach my $GETL (#GETLOOP) {
my $is_downloaded = if_DOWNLOADED($LOGS_LOCAL_PATH,$TODAY,$GETL);
if(!$is_downloaded)
{
get_LOGS("172.25.70.221","abc","abc2","/x01/INT/abc",$GETL);
print "File \"$GETL\" downloaded to ==> \"$LOGS_LOCAL_PATH$TODAY\"\n\n";
}
else
{
print "File \"$GETL\" has already been Downloaded to ==> \"$LOGS_LOCAL_PATH$TODAY\"\n\n";
}
}
sub get_LOGS {
my $LOG_HOST = shift;
my $REM_USER = shift;
my $REM_PASSW = shift;
my $REM_PATH = shift;
my $REM_FILE = shift;
print "Connecting to the sftp share! Please wait....!\n";
my $sftp = Net::SFTP::Foreign->new($LOG_HOST, user => $REM_USER, password => $REM_PASSW);
$sftp->setcwd($REM_PATH) or die "unable to change cwd: " . $sftp->error;
print "OK. On the share! Downloading the file \"$REM_FILE\"...................!\n\n\n\n";
$sftp->error and die "Problem connecting to the share...!!!! " . $sftp->error;
$sftp->get($REM_FILE) or die "File does not seem to be present on the remote share. Please re-request..!!!" . $sftp->error;
return $REM_FILE;
}
sub if_DOWNLOADED {
my $DWD_FILE_PATH = shift;
my $DWD_DIR = shift;
my $DWD_FILE = shift;
if (-e "$DWD_FILE_PATH/$DWD_DIR/$DWD_FILE")
{
return 1;
}
else
{
return 0;
}
}
Please can someone help me finding a solution to this matter? Please try to use the same script and modify.
/V
Some comments to your code:
Use strict and warnings in order to catch lots of errors early.
Read some book on style (i.e. Damian Conway's Perl Best Practices). But in any case try to be consistent when naming variables, subroutines, and everything and also with their case.
When you have to use some calculated value in several places, try to calculate it once and save it in a variable.
Don't use subroutines for trivial things.
You don't need to call chomp on variables you have defined and that don't have a "\n" character at the end.
Opening a new SFTP connection for every file transfer is very inefficient. You can open just one at the beginning and use it for all the transfers.
And now, a simplified version of your script:
#!/usr/bin/perl
use strict;
use warnings;
my $host = "172.25.70.221";
my $user = "abc";
my $password = "abc1234321";
my $LOGS_LOCAL_PATH = "/x02/ABC/abc2";
my $LOGS_REM_PATH = "/x01/INT/abc/vim";
my $TODAY = `date +%Y%m%d`;
chomp $TODAY;
my $TODAY_LOCAL_PATH = "$LOGS_LOCAL_PATH/$TODAY";
my #files = #ARGV;
#files or die "\nUsage: gtp_logs.pl <file1> <file2> <file3>.....<file(n)>\n\n";
system("clear");
if ( -d $TODAY_LOCAL_PATH) {
print "Directory already exists. Logs will be downloaded to ==> \"$TODAY_LOCAL_PATH\".....!\n\n";
}
else {
print "Directory \"$TODAY\" doesn't exist. So creating the directory..!\n";
mkdir "$TODAY_LOCAL_PATH" or die "unable to create directory: $!\n";
print "OK..Done.....!\n\n";
}
chdir $TODAY_LOCAL_PATH or die "cannot cd to ($!)\n";
my $sftp = Net::SFTP::Foreign->new($host, user => $user, password => $password);
$sftp->error
and die "Problem connecting to the share...!!!! " . $sftp->error;
my $ok = 0;
my $failed = 0;
foreach my $file (#files) {
if (-e "$TODAY_LOCAL_PATH/$file") {
print "File \"$file\" has already been Downloaded to ==> \"$TODAY_LOCAL_PATH\"\n";
}
else {
if ($sftp->get("$LOGS_REM_PATH/$file")) {
print "File \"$file\" downloaded to ==> \"$TODAY_LOCAL_PATH\"\n";
$ok++;
}
else {
print "Unable to download file \"$file\" : " . $sftp->error . "\n";
$failed++;
}
}
}
print "$ok files have been downloaded, $failed files failed!\n\n";

perl get webpage error with LWP:Simple

I have a project I'm working on for school but I seem to be getting an error here...
I get "Can't call method 'content' on an undefined value at line 5"
use LWP::Simple;
for(my $id=0;$id<55;$id++)
{
my $response = get("http://www.gamereplays.org/community/index.php?act=medals&CODE=showmedal&MDSID=" + $id );
my $content = $response->content;
for(my $id2=0;$id2<10;$id2++)
{
$content =~ /<img src="http:\/\/www\.gamereplays.org\/community\/style_medals\/(.*)$id2\.gif" alt=""\/>/;
$url = "http://www.gamereplays.org/community/style_medals/" . $1 . $id2 . ".gif";
getstore($url, $1 . $id2 . ".gif");
}
}
LWP::simple doesn't return a response object, it return directly a string containing the response body.
And your put some pause between each request to avoid pounding the targeted website.

Copy and retain previous output for backup of transformed json data

I have a perl script that transforms json data to perl and saves output in files called teams.txt, backyard, and also a file called backup.txt, where the output of teams.txt is copied from. The following are two snippets from the script/the part of it that writes the data to the text files:
my %manager_to_directs;
my %user_to_manager;
my #users;
my $url = "https://xxxxxxxxxxxxxx.com/api/v1/reports/active/week";
my $useragent = LWP::UserAgent->new();
my $response = $useragent->get(($url));
if ($response->code !~ "200" || $response->code !~ "204" ){
while ($url && $url ne "Null") {
my $data = fetch_json($url);
last if !defined $data;
$url = $data->{next};
.
.
.
# write backyard.txt
open my $backyard_fh, ">", "backyard.txt";
foreach my $user (sort keys %user_to_management_chain) {
my $chain = join ',', #{$user_to_management_chain{$user}};
print $backyard_fh "$user:$chain\n";
}
close $backyard_fh;
# write teams.txt
open my $team_fh, ">", "teams.txt";
foreach my $user (sort #users) {
my $followers = $manager_to_followers{$user};
my $followers_joined = $followers ? join (',', sort #$followers) : "";
print $team_fh "$user:$followers_joined\n";
}
close $team_fh;
# write backup.txt, backup for teams.txt
open my $backup_fh, ">", "backup.txt";
copy("teams.txt", "backup.txt")
or die ("Can't copy teams.txt \n");
close $backup_fh;
This works almost exactly how I want it to, but now I've been testing with a negative scenario, where the .json url provided in the script is false/nonexistent, and I have to make sure that not another teams.txt file is created and the backup.txt file is still retained from the last execution.
I tested by replacing
my $url = "https://xxxxxxxxxxxxxx.com/api/v1/reports/active/week";
with
my $url = "https://fakeUrl.com/api/v1/reports/active/week";
And in this scenario, 404 would be passed and the program is supposed to fail. With this test, I noticed that the the contents of teams.txt and backyard.txt get wiped, but the backup.txt file gets wiped too...and that's not good.
I'm fine with teams.txt and backyard.txt being overwritten per each run of the script, but I need the backup.txt file to be retained no matter what, unless the program runs successfully and there's new content from teams.txt to be copied over to backup.txt.
Any help I can get is highly appreciated!
Following code snippets taken almost directly from documentation for modules.
May be you should try this approach.
use strict;
use warnings;
use feature 'say';
use LWP::UserAgent ();
my $url = 'https://metacpan.org/pod/HTTP::Tiny';
$url = 'https://fakeUrl.com/api/v1/reports/active/week';
my $ua = LWP::UserAgent->new(timeout => 10);
$ua->env_proxy;
my $response = $ua->get($url);
my $data;
if ($response->is_success) {
$data = $response->decoded_content;
}
else {
die $response->status_line;
}
# Process further data
say $data;
Output
500 Can't connect to fakeUrl.com:443 (Bad file descriptor) at C:\....\http_lwp.pl line 19.
use strict;
use warnings;
use feature 'say';
use HTTP::Tiny;
my $url = 'https://metacpan.org/pod/HTTP::Tiny';
$url = 'https://fakeUrl.com/api/v1/reports/active/week';
my $data;
my $response = HTTP::Tiny->new->get($url);
if( $response->{success} ) {
$data = $response->{content};
} else {
say "$response->{status} $response->{reason}";
exit 1;
}
# Process further data
say $data;
Output
403 Forbidden