Sending gif by ftp to a server in different folders - perl

I use perl for sending gif to a server. I want to move some gif to a folder called precipitation and other gifs to a folder called wind. Now with the following code (well, it is a part of the code I use) I send them, but there is something wrong in the code, because I find all gif in the same folder, in the first one, precipitation. Any idea?
use BSD::Resource;
use File::Copy;
use Net::FTP;
#ACCIONS send gif to the folder precipitation
$ftp->cwd("html/pen/precipitation");
foreach my $file ($ftp->ls("Pl*.gif")){
$ftp->delete($file) or die "Error in delete\n";
}
my #arxius = glob("/home/gif/Pen/Pl*.gif");
foreach my $File(#arxius){
$ftp->binary();
$ftp->put("$File");
}
#ACCIONS send gif to the folder wind
$ftp->cwd("html/pen/wind");
foreach my $file2 ($ftp->ls("vent*.gif")){
$ftp->delete($file2) or die "Error in delete\n";
}
my #arxius2 = glob("/home/gif/Pen/vent*.gif");
foreach my $File(#arxius2){
$ftp->binary();
$ftp->put("$File");
}

The behavior indicates that the second call to cwd() failed.
Most likely this is because you are using a relative rather than absolute path: the second cwd() call is relative to the location set in the first one. It tries to go to html/pen/precipitation/html/pen/wind which doesn't appear to be what you want.
Use an absolute path or ../wind in the second cwd() call.
Also, you should check for the success of the cwd() commands and stop if you didn't change to the expected directory. Otherwise, you are performing potentially destructive actions (like deleting files) in the wrong place!cwd() will return true if it worked and false otherwise. See the Net::FTP documentation.

Related

About searching recursively in Perl

I have a Perl script that I, well, mostly pieced together from questions on this site. I've read the documentation on some parts to better understand it. Anyway, here it is:
#!/usr/bin/perl
use File::Find;
my $dir = '/home/jdoe';
my $string = "hard-coded pattern to match";
find(\&printFile, $dir);
sub printFile
{
my $element = $_;
if(-f $element && $element =~ /\.txt$/)
{
open my $in, "<", $element or die $!;
while(<$in>)
{
if (/\Q$string\E/)
{
print "$File::Find::name\n";
last; # stops looking after match is found
}
}
}
}
This is a simple script that, similar to grep, will look down recursively through directories for a matching string. It will then print the location of the file that contains the string. It works, but only if the file is located in my home directory. If I change the hard-coded search to look in a different directory (that I have permissions in), for example /admin/programs, the script no longer seems to do anything: No output is displayed, even when I know it should be matching at least one file (tested by making a file in admin/programs with the hard-coded pattern. Why am I experiencing this behavior?
Also, might as well disclaim that this isn't a really useful script (heck, this would be so easy with grep or awk!), but understanding how to do this in Perl is important to me right now. Thanks
EDIT: Found the problem. A simple oversight in that the files in the directory I was looking for did not have .txt as extension. Thanks for helping me find that.
I was able to get the desired output using the code you pasted by making few changes like:
use strict;
use warnings;
You should always use them as they notify of various errors in your code which you may not get hold of.
Next I changed the line :
my $dir = './home/jdoe'; ##'./admin/programs'
The . signifies current directory. Also if you face problems still try using the absolute path(from source) instead of relative path. Do let me know if this solves your problem.
This script works fine without any issue. One thing hidden from this script to us is the pattern. you can share the pattern and let us know what you are expecting from that pattern, so that we can validate that.
You could also run your program in debug mode i.e.,
perl -d your_program.
That should take you to debug mode and there are lot of options available to inspect through the flow. type 'n' on the debug prompt to step in to the code flow to understand how your code flows. Typing 'n' will print the code execution point and its result

Change output filename from WGET when using input file option

I have a perl script that I wrote that gets some image URLs, puts the urls into an input file, and proceeds to run wget with the --input-file option. This works perfectly... or at least it did as long as the image filenames were unique.
I have a new company sending me data and they use a very TROUBLESOME naming scheme. All files have the same name, 0.jpg, in different folders.
for example:
cdn.blah.com/folder/folder/202793000/202793123/0.jpg
cdn.blah.com/folder/folder/198478000/198478725/0.jpg
cdn.blah.com/folder/folder/198594000/198594080/0.jpg
When I run my script with this, wget works fine and downloads all the images, but they are titled 0.jpg.1, 0.jpg.2, 0.jpg.3, etc. I can't just count them and rename them because files can be broken, not available, whatever.
I tried running wget once for each file with -O, but it's embarrassingly slow: starting the program, connecting to the site, downloading, and ending the program. Thousands of times. It's an hour vs minutes.
So, I'm trying to find a method to change the output filenames from wget without it taking so long. The original approach works so well that I don't want to change it too much unless necessary, but i am open to suggestions.
Additional:
LWP::Simple is too simple for this. Yes, it works, but very slowly. It has the same problem as running individual wget commands. Each get() or get_store() call makes the system re-connect to the server. Since the files are so small (60kB on average) with so many to process (1851 for this one test file alone) that the connection time is considerable.
The filename i will be using can be found with /\/(\d+)\/(\d+.jpg)/i where the filename will simply be $1$2 to get 2027931230.jpg. Not really important for this question.
I'm now looking at LWP::UserAgent with LWP::ConnCache, but it times out and/or hangs on my pc. I will need to adjust the timeout and retry values. The inaugural run of the code downloaded 693 images (43mb) in just a couple minutes before it hung. Using simple, I only got 200 images in 5 minutes.
use LWP::UserAgent;
use LWP::ConnCache;
chomp(#filelist = <INPUTFILE>);
my $browser = LWP::UserAgent->new;
$browser->conn_cache(LWP::ConnCache->new());
foreach(#filelist){
/\/(\d+)\/(\d+.jpg)/i
my $newfilename = $1.$2;
$response = $browser->mirror($_, $folder . $newfilename);
die 'response failure' if($response->is_error());
}
LWP::Simple's getstore function allows you to specify a URL to fetch from and the filename to store the data from it in. It's an excellent module for many of the same use cases as wget, but with the benefit of being a Perl module (i.e. no need to outsource to the shell or spawn off child processes).
use LWP::Simple;
# Grab the filename from the end of the URL
my $filename = (split '/', $url)[-1];
# If the file exists, increment its name
while (-e $filename)
{
$filename =~ s{ (\d+)[.]jpg }{ $1+1 . '.jpg' }ex
or die "Unexpected filename encountered";
}
getstore($url, $filename);
The question doesn't specify exactly what kind of renaming scheme you need, but this will work for the examples given by simply incrementing the filename until the current directory doesn't contain that filename.

Downloads in Firefox using Perl WWW::Mechanize::Firefox

I have a list of URLs of pdf files that i want to download, from different sites.
In my firefox i have chosen the option to save PDF files directly to a particular folder.
My plan was to use WWW::Mechanize::Firefox in perl to download each file (in the list - one by one) using Firefox and renaming the file after download.
I used the following code to do it :
use WWW::Mechanize::Firefox;
use File::Copy;
# #list contains the list of links to pdf files
foreach $x (#list) {
my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);
$mech->get($x); #This downloads the file using firefox in desired folder
opendir(DIR, "output/download");
#FILES= readdir(DIR);
my $old = "output/download/$FILES[2]";
move ($old, $new); # $new is the URL of the new filename
}
When i run the file, it opens the first link in Firefox and Firefox downloads the file to the desired directory. But, after that the 'new tab' is not closed and the file does not get renamed and the code keeps running (like its encountered an endless loop) and no further file gets downloaded.
What is going on here? Why isnt the code working? How do i close the tab and make the code read all the files in the list? Is there any alternate way to download?
Solved the problem.
The function,
$mech->get()
waits for 'DOMContentLoaded' Firefox event to be fired by Firefox upon page load. As i had set Firefox to download the files automatically, there was no page being loaded. Thus, the 'DOMContentLoaded' event was never being fired. This led to pause in my code.
I set the function to not wait for the page to load by using the following option
$mech->get($x, synchronize => 0);
After this, i added 60 second delay to allow Firefox to download the file before code progresses
sleep 60;
Thus, my final code look like
use WWW::Mechanize::Firefox;
use File::Copy;
# #list contains the list of links to pdf files
foreach $x (#list) {
my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);
$mech->get($x, synchronize => 0);
sleep 60;
opendir(DIR, "output/download");
#FILES= readdir(DIR);
my $old = "output/download/$FILES[2]";
move ($old, $new); # $new is the URL of the new filename
}
If i understood you correctly, you have the links to the actual pdf files.
In that case WWW::Mechanize is most likely easier than WWW::Mechanize::Firefox. In fact, i think that is almost always the case. Then again, watching the browser work is certainly cooler.
use strict;
use warnings;
use WWW::Mechanize;
# your code here
# loop
my $mech = WWW::Mechanize->new(); # Could (should?) be outside of the loop.
$mech->agent_alias("Linux Mozilla"); # Optionally pretend to be whatever you want.
$mech->get($link);
$mech->save_content("$new");
#end of the loop
If that is absolutely not what you wanted, my cover story will be that i did not want to break my 666 rep!

perl: open filehandle, write into it, give it a name later on?

I think I've read how to do this somewhere but I can't find where. Maybe it's only possible in new(ish) versions of Perl. I am using 5.14.2:
I have a Perl script that writes down results into a file if certain criteria are met. It's more logical given the structure of the script to write down the results and later on check if the criteria to save the results into a file are met.
I think I've read somewhere that I can write content into a filehandle, which in Linux I guess will correspond to a temporary file or a pipe of some sorts, and then give the name to that file, including the directory where it should be, later on. If not, the content will be discarded when the script finishes.
Other than faffing around temporary files and deleting them manually, is there a straightforward way of doing this in Perl?
There's no simple (UNIX) facility for what you describe, but the behavior can be composed out of basic system operations. Perl's File::Temp already does most of what you want:
use File:Temp;
my $tmp = File::Temp->new; # Will be unlinked at end of program.
while ($work_to_do) {
print $tmp a_lot_of_stuff(); # $tmp is a filehandle
}
if ($save_it) {
rename($tmp, $new_file); # $tmp is also a string. Move (rename) the file.
} # If you need this to work across filesystems, you
# might want to ``use File::Copy qw(move)'' instead.
exit; # $tmp will be unlinked here if it was not renamed
I use File::Temp for this.
But you should have in mind that File::Temp deletes the file by default. That is OK but in my case I don't want that when debugging. If the script terminates and the output is not the desired one I can not check the temp file.
So I prefer to set $KEEP_ALL=1 or $fh->unlink_on_destroy( 0 ); when OO or ($fh, $filename) = tempfile($template, UNLINK => 0); and then unlink the file myself or move to a proper place.
It would be safer to move the file after closing the filehandle (just in case there is some buffering going on). So I would prefer an approach where temp file is not deleted by default and then when all is done, set a conditional that either delete it or move it to your desired place and name.

Accessing a file in perl

In my script I am dealing with opening files and writing to files. I found that there is some thing wrong with a file I try to open, the file exists, it is not empty and I am passing the right path to file handle.
I know that my question might sounds weird but while I was debugging my code I put the following command in my script to check some files
system ("ls");
Then my script worked well, when it's removed it does not work correctly anymore.
my #unique = ("test1","test2");
open(unique_fh,">orfs");
print unique_fh #unique ;
open(ORF,"orfs")or die ("file doesnot exist");
system ("ls");
while(<ORF>){
split ;
}
#neworfs=#_ ;
print #neworfs ;
Perl buffers the output when you print to a file. In other words, it doesn't actually write to the file every time you say print; it saves up a bunch of data and writes it all at once. This is faster.
In your case, you couldn't see anything you had written to the file, because Perl hadn't written anything yet. Adding the system("ls") call, however, caused Perl to write your output first (the interpreter is smart enough to do this, because it thinks you might want to use the system() call to do something with the file you just created).
How do you get around this? You can close the file before you open it again to read it, as choroba suggested. Or you can disable buffering for that file. Put this code just after you open the file:
my $fh = select (unique_fh);
$|=1;
select ($fh);
Then anytime you print to the file, it will get written immediately ($| is a special variable that sets the output buffering behavior).
Closing the file first is probably a better idea, although it is possible to have a filehandle for reading and writing open at the same time.
You did not close the filehandle before trying to read from the same file.