Perl search for a particular file extension in folder and sub folder - perl

I have a folder which has over 1500 files scattered around in different sub-folders with extension .fna. I was wondering if there is a simple way in Perl to extract all these files and store them in a different location?

As File::Find is recommended everywhere, let me add that there are other, sometimes nicer, options, like https://metacpan.org/pod/Path::Iterator::Rule or Path::Class traverse function.

Which OS are you using? If it's Windows, I think a simple xcopy command would be a lot easier. Open a console window and type "xcopy /?" to get the info on this command. It should be something simple like:
xcopy directory1/*.fna directory2 /s

use File::Find;
my #files;
find(\&search, '/some/path/*.fna');
doSomethingWith(#files);
exit;
sub search {
push #files, $File::Find::name;
return;
}

Without much more information to go on, you don't need a perl script to do something as easy as this.
Here's a *nix one-liner
find /source/dir -name "*.fna" -exec mv -t /target/dir '{}' \+ -print

sorry for the late response. I was away for a conference. Here is my code which seem to work fine so far.
use strict;
use warnings;
use Cwd;
use FileHandle;
open my $out, ">>results7.txt" or die;
my $parent = "/home/denis/Denis_data/Ordered species";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
#my $path = $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
chdir($path) or die;
system 'perl batch_hmm.pl';
print $out $path."\n";
#chdir('..') or die;
#closedir($sub_dir);
}
closedir($par_dir);
I will also try the File::Finder option. The above one looks quite messy.

Related

File::Find in Perl - Looking for files only

I have a script like this to list every FILES inside my root path
use strict;
use File::Find qw(find);
my $path = "<my root path>";
find(\&Search, $path);
sub Search{
my $filename = $File::Find::name;
if(-f $filename){
print $filename."\n";
}
}
My point is to try to list all the FILES. However, it also listed the symlink inside my $root. I modify my Search function like this and it worked:
sub Search{
my $filename = $File::Find::name;
#Check if $filename is not symlink first
if(!-l $filename){
if(-f $filename){
print $filename."\n";
}
}
}
But it seem awkward right ? Why do we need two if condition just to verify $filename is the real file and not a symlink !!!
Is there anyone can suggest a better, more decent solution for this ?
Thank you and best regards.
Alex
-f is testing for file, and that includes symlinks. So yes, you do have to test both.
One slightly useful thing, is that you can probably just do:
if ( -f and not -l ) {
because File::Find sets $_ to the current file, and the file tests default to using that too. (won't work if you turn on no_chdir though).
You may also want to consider File::Find::Rule as an alternative to File::Find.
stat and lstat are identical except when it comes to symlinks. The former collects information about the linked file, whereas the latter collects information about the link itself.
The -X EXPR uses stat. lstat is needed here.
sub Search {
my $filename = $File::Find::name;
if (!lstat($filename)) {
warn("Can't stat $filename: $!\n");
return;
}
say $filename if -f _;
}
Bonus: Error checking becomes much simpler when you pre-call stat or lstat.

Recursive directory traversal in Perl

I'm trying to write a script that prints out the file structure starting at the folder the script is located in. The script works fine without the recursive call but with that call it prints the contents of the first folder and crashes with the following message: closedir() attempted on invalid dirhandle DIR at printFiles.pl line 24. The folders are printed and the execution reaches the last line but why isn't the recursive call done? And how should I solve this instead?
#!/usr/bin/perl -w
printDir(".");
sub printDir{
opendir(DIR, $_[0]);
local(#files);
local(#dirs);
(#files) = readdir(DIR);
foreach $file (#files) {
if (-f $file) {
print $file . "\n";
}
if (-d $file && $file ne "." && $file ne "..") {
push(#dirs, $file);
}
}
foreach $dir (#dirs) {
print "\n";
print $dir . "\n";
printDir($dir);
}
closedir(DIR);
}
You should always use strict; and use warnings; at the start of your Perl program, especially before you ask for help with it. That way Perl will show up a lot of straightforward errors that you may not notice otherwise.
The invalid filehandle error is likely because DIR is a global directory handle and has been closed already by a previous execution of the subroutine. It is best to always used lexical handles for both files and directories, and to test the return code to make sure the open succeeded, like this
opendir my $dh, $_[0] or die "Failed to open $_[0]: $!";
One advantage of lexical file handles is that they are closed implicitly when they go out of scope, so there is no need for your closedir call at the end of the subroutine.
local isn't meant to be used like that. It doesn't suffice as a declaration, and you are creating a temporary copy of a global variable that everything can access. Best to use my instead, like this
my #dirs;
my #files = readdir $dh;
Also, the file names you are using from readdir have no path, and so your file tests will fail unless you either chdir to the directory being processed or append the directory path string to the file name before testing it.
Use the File::Find module. The way i usually do this is using the find2perl tool which comes with perl, which takes the same parameters as find and creates a suitable perl script using File::Find. Then i fine-tune the generated script to do what i want it to do. But it's also possible to use File::Find directly.
Why not use File::Find?
use strict; #ALWAYS!
use warnings; #ALWAYS!
use File::Find;
find(sub{print "$_\n";},".");

How to call script within subfolders

I have a script that I'm using to remove duplicate Calendar entries. The root mail folder contains folders with each folder being firstname_lastname, then beneath each other is /Calendar/#msgs/.
As of now, I'm running script manually by going to the users' folder and starting the script /Users/Documents/duplicates/dups.pl . --killdups`
Is there a way that I could easily have it loops through all the users mail folders and look in the respective /Calendar/#msgs/ folder and run the script?
There are a couple ways you can go with this, depending on what you want to do.
First, you can make your script search every folder under it's starting directory. You don't specify anything on the command line.
use File::Spec::Functions qw(catfile);
my #users = glob( '/Users/*' );
foreach my $user ( #users ) { # $user looks like /Users/Buster
my $calendar_dir = catfile( $user, 'Calendar', '#msgs' );
...
}
You could also use opendir to get the list of users so you get back one directory at a time:
opendir my $dh, '/Users' or die ...;
while( my $user = readdir $dh ) {
next if $user =~ /^\.\.?\z/; # and anything else you want to skip
... # do the cool stuff
}
Second, you can make it search selected folders. Suppose that you are in your home directory. To kill the duplicates for the particular users, you'd call your script with those user's names:
dups.pl --killdups Buster Mimi Roscoe
To go through all users, maybe something like this (it almost looks like you are on MacOS X, but not quite, so I'm not sure which path you need), using a command-line glob:
dups.pl --killdups /Users/*
The solution looks similar, but you take the users from #ARGV instead of using a glob:
foreach my $user ( #ARGV ) {
...
}
That should be enough to get you started. You'll have to integrate this with the rest of your script and fix up the paths in each case to be what you need, but that's just simple string manipulation (or even simpler than that with File::Spec.
Pass in the folders it should look at on the command line. The arguments will be in #ARGV, you just loop over it.
Edit: Maybe you prefer an elegant Perl solution ?
#!/usr/bin/perl -w
# CC-by Cedric 'levif' Le Dillau.
use File::Find;
#ARGV = qw(.) unless #ARGV;
find sub { apply_to_folder($File::Find::name) if -d }, #ARGV;
sub apply_to_folder {
my $folder = shift;
printf "folder: %s\n", $folder;
}
Then, yourapply_to_folder() function can be whatever you want.
Note that replacing -d by -f or -f && -x can change the filtering feature.
(help can be found with perldoc -f -X)
Older proposition was:
Try using:
$ find "/Calendar/#msgs/" -type d -exec dups.pl "{}" --killdups \;
Or perl's opendir()/readdir() functions:
$ perldoc -f opendir

How can I copy a directory except for all of the hidden files in Perl?

I have a directory hierarchy with a bunch of files. Some of the directories start with a ..
I want to copy the hierarchy somewhere else, leaving out all files and dirs that start with a .
How can one do that?
I think what you want is File::Copy::Recursive's rcopy_glob():
rcopy_glob()
This function lets you specify a
pattern suitable for perl's glob() as
the first argument. Subsequently each
path returned by perl's glob() gets
rcopy()ied.
It returns and array whose items are
array refs that contain the return
value of each rcopy() call.
It forces behavior as if
$File::Copy::Recursive::CPRFComp is
true.
If you're able to solve this problem without Perl, you should check out rsync. It's available on Unix-like systems, on Windows via cygwin, and perhaps as a stand-alone tool on Windows. It will do what you need and a whole lot more.
rsync -a -v --exclude='.*' foo/ bar/
If you aren't the owner of all of the files, use -rOlt instead of -a.
Glob ignores dot files by default.
perl -lwe'rename($_, "foo/$_") or warn "failure renaming $_: $!" for glob("*")'
The code below does the job in a simple way but doesn't handle symlinks, for example.
#! /usr/bin/perl
use warnings;
use strict;
use File::Basename;
use File::Copy;
use File::Find;
use File::Spec::Functions qw/ abs2rel catfile file_name_is_absolute rel2abs /;
die "Usage: $0 src dst\n" unless #ARGV == 2;
my($src,$dst) = #ARGV;
$dst = rel2abs $dst unless file_name_is_absolute $dst;
$dst = catfile $dst, basename $src if -d $dst;
sub copy_nodots {
if (/^\.\z|^[^.]/) {
my $image = catfile $dst, abs2rel($File::Find::name, $src);
if (-d $_) {
mkdir $image
or die "$0: mkdir $image: $!";
}
else {
copy $_ => $image
or die "$0: copy $File::Find::name => $image: $!\n";
}
}
}
find \&copy_nodots => $src;
cp -r .??*
almost perfect, because it misses files beginning with . and followed by a single sign. like - .d or .e
echo .[!.] .??*
this is even better
or:
shopt -s dotglob ; cp -a * destination; shopt -u dotglob
I found File::Copy::Recursive's rcopy_glob().
The following is what is showed in the docs but is deceptive.
use File::Copy::Recursive qw(fcopy rcopy dircopy fmove rmove dirmove);
it does not import rcopy_glob() and the only way I found to use it was to be explict as follows:
use File::Copy::Recursive;
File::Copy::Recursive::rcopy_glob("glob/like/path","dest/path");

Why does my jzip process hang when I call it with Perl's system?

I am definitely new to Perl, and please forgive me if this seem like a stupid question to you.
I am trying to unzip a bunch of .cab file with jzip in Perl (ActivePerl, jzip, Windows XP):
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use IO::File;
use v5.10;
my $prefix = 'myfileprefix';
my $dir = '.';
File::Find::find(
sub {
my $file = $_;
return if -d $file;
return if $file !~ /^$prefix(.*)\.cab$/;
my $cmd = 'jzip -eo '.$file;
system($cmd);
}, $dir
);
The code decompresses the first .cab files in the folder and hangs (without any errors). It hangs in there until I press Ctrl+c to stop. Anyone know what the problem is?
EDIT: I used processxp to inspect the processes, and I found that there are correct number of jzip processes fired up (per the number of cab files resides at the source folder). However, only one of them is run under cmd.exe => perl, and none of these process gets shut down after fired. Seems to me I need to shut down the process and execute it one by one, which I have no clue how to do so in perl. Any pointers?
EDIT: I also tried replacing jzip with notepad, it turns out it opens up notepad with one file at a time (in sequential order), and only if I manually close notepad then another instance is fired. Is this common behavior in ActivePerl?
EDIT: I finally solved it, and I am still not entire sure why. What I did was removing XML library in the script, which should not relevant. Sorry I removed "use XML::DOM" purposefully in the beginning as I thought it is completely irrelevant to this problem.
OLD:
use strict;
use warnings;
use File::Find;
use IO::File;
use File::Copy;
use XML::DOM;
use DBI;
use v5.10;
NEW:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use IO::File;
use File::Copy;
use DBI;
use v5.10;
my $prefix = 'myfileprefix';
my $dir = '.';
# retrieve xml file within given folder
File::Find::find(
sub {
my $file = $_;
return if -d $file;
return if $file !~ /^$prefix(.*)\.cab$/;
say $file;
#say $file or die $!;
my $cmd = 'jzip -eo '.$file;
say $cmd;
system($cmd);
}, $dir
);
This, however, imposes another problem, when the extracted file already exists, the script will hang again. I highly suspect this is a problem of jzip and an alternative of solving the problem is simply replacing jzip with extract, like #ghostdog74 pointed out below.
First off, if you are using commands via system() call, you should always redirect their output/error to a log or at least process within your program.
In this particular case, if you do that, you'd have a log of what every single command is doing and will see if/when any of them are stuck.
Second, just a general tip, it's a good idea to always use native Perl libraries - in this case, it may be impossible of course (I'm not that experienced with Windows Perl so no clue if there's a jzip module in Perl, but search CPAN).
UPDATE: Didn't find a Perl native CAB extractor, but found a jzip replacement that might work better - worth a try. http://www.cabextract.org.uk/ - there's a DOS version which will hopefully work on Windows
Based on your edit, this is what I suggest:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use IO::File;
use v5.10;
my $prefix = 'myfileprefix';
my $dir = '.';
my #commands;
File::Find::find(
sub {
my $file = $_;
return if -d $file;
return if $file !~ /^$prefix(.*)\.cab$/;
my $cmd = "jzip -eo $File::Find::name";
push #commands, $cmd;
}, $dir
);
#asynchronously kick off jzips
my $fresult;
for #commands
{
$fresult = fork();
if($fresult == 0) #child
{
`$_`;
}
elsif(! defined($fresult))
{
die("Fork failed");
}
else
{
#no-op, just keep moving
}
}
edit: added asynch. edit2: fixed scope issue.
What happens when you run the jzip command from the dos window? Does it work correctly? What happens if you add an end of line character (\n) to the command in the script? Does this prevent the hang?
here's an alternative, using extract.exe which you can download here or here
use File::Find;
use IO::File;
use v5.10;
my $prefix = 'myfileprefix';
my $dir = '.';
File::Find::find({wanted => \&wanted}, '.');
exit;
sub wanted {
my $destination = q(c:\test\temp);
if ( -f $_ && $_=~/^$prefix(.*)\.cab$/ ) {
$filename = "$File::Find::name";
$path = "$File::Find::dir";
$cmd = "extract /Y $path\\$filename /E /L $destination";
print $cmd."\n";
system($cmd);
}
} $dir;
Although no one has mentioned it explicitly, system blocks until the process finishes. The real problem, as people have noted, is figuring out why the process doesn't exit. Forking or any other parallelization won't help because you'll be left with a lot of hung processes.
Until you can figure out the issue, start small. Make the smallest Perl script that demonstrates the problem:
#!perl
system( '/path/to/jzip', '-eo', 'literal_file_name' ); # full path, list syntax!
print "I finished!\n";
Now the trick is to figure out why it hangs, and sometimes that means different solutions for different external programs. Sometimes you need to close STDIN before you run the external process or it sits there waiting for it to close, sometimes you do some other thing.
Instead of system, you might also try things such as IPC::System::Simple, which handles a lot of platform-specific details for you, or modules like IPC::Run or IPC::Open3.
Sometimes it just sucks, and this situation is one of those times.