Reading the contents of directories in Perl

Reading the contents of directories in Perl - perl

I am trying to accomplish two things with a Perl script. I have a file, which in the first subdirectory has different user directories, and in each of these user directories contains some folders that have text files in them. I am trying to write a Perl script that
Lists the directories for each user
Gets the total number of .txt files
For the second objective I have this code
my #emails = glob "$dir/*.txt";
for (0..$#emails){
$emails[$_] =~ s/\.txt$//;
}
$emails=#emails;
but $emails is returning 0. Any insight?

Typically, using glob is not very good idea when it comes to processing files in directories and possible subdirectories.
Much better way is to use File::Find module, like this:
use File::Find;
my #emails;
File::Find::find(
{
# this will be called for each file under $dir:
wanted => sub {
my $file = $File::Find::name;
return unless -f $file and $file =~ /\.txt$/;
# this file looks like email, remember it:
push #emails, $file;
}
},
$dir
);
print "Found " . scalar #emails . " .txt files (emails) in '$dir'\n";

Related

create new file listing all text files in a directory with perl

I am trying to list out all text files in a directory using perl. The below does run but the resulting file is empty. This seems close but maybe it is not what I need. Thank you :).
get_list.pl
#!/bin/perl
# create a list of all *.txt files in the current directory
opendir(DIR, ".");
#files = grep(/\..txt$/,readdir(DIR));
closedir(DIR);
# print all the filenames in our array
foreach $file (#files) {
print "$file\n";
}

As written, your grep is wrong:
#files = grep(/\..txt$/,readdir(DIR));
In regular expressions - . means any character. So you will find a file called
fish.mtxt
But not a file called
fish.txt
Because of that dot.
You probably want to grep /\.txt/, readdir(DIR)
But personally, I wouldn't bother, and just use glob instead.
foreach my $file (glob "*.txt") {
print $file,"\n";
}
Also - turn on use strict; use warnings;. Consider them mandatory until you know why you want to turn them off. (There are occasions, but you'll know what they are if you ever REALLY NEED to).

You have one excess dot:
#files = grep(/\..txt$/,readdir(DIR));
should be:
#files = grep(/\.txt$/,readdir(DIR));

How can I create a new output file for each subfolder under a main folder using perl?

I have 100 subfolder in a main folder. They have difference names. Each subfolder includes a .txt file, which has 10 column. I want to get a new .txt file for each subfolder. Each new .txt file must be in its own folder. That is I will have 2 .txt files (old and new) in each subfolder. I am trying to select the lines starting "ATOM" and some columns 2,6,7 and 8 from each .txt file. My code is the following. It doesn't work correctly. It doesnt create a new .txt file. How can i figure out this problem?
#!/usr/bin/perl
$search_text = "ATOM";
#files = <*/*.txt>;
foreach $file (#files) {
print $file . "\n";
open(DATA, $file);
open(OUT_FILE, ">$file a.txt");
while ($line = <DATA>)
{
#fields = split /\s+/, $line;
if ($line =~ m/$search_text/)
{
print OUT_FILE "$fields[2]\t$fields[6]\t$fields[7]\t$fields[8]\n";
}
}
}
close(OUT_FILE);

To put the output file a.txt into the same directory as the input file, you need to extract the directory name from the input file name, and prepend it to the output file name (a.txt). There are a couple of ways you can do that; probably the simplest is to use dirname() from the standard module File::Basename:
use File::Basename;
my $dir = dirname($file);
open(OUT_FILE, ">", "$dir/a.txt") or die "Failed to open $dir/a.txt: $!";
or you could use File::Spec directly:
use File::Spec;
my ($volume, $dir) = File::Spec->splitpath($file);
my $outname = File::Spec->catpath($volume, $dir, 'a.txt');
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
or you could just use a regexp substitution:
my $outname = ( $file =~ s![^/]+$!a.txt!r );
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
Ps. In any case, I'd recommend adopting several good habits that will help you write better Perl scripts:
Always start your scripts with use strict; and use warnings;. Fix any errors and warnings they produce. In particular, declare all your local variables with my to make them lexically scoped.
Check the return value of functions like open(), and abort the script if they fail. (I've done this in my examples above.)
Use the three-argument form of open(), as I also did in my examples above. It's a lot less likely to break if your filenames contain funny characters.
Consider using lexically scoped file handles (open my $out_file, ...) instead of global file handles (open OUT_FILE, ...). I didn't do that in my code snippets above, because I wanted to keep them compatible with the rest of your code, but it would be good practice.
If you're pre-declaring a regular expression, like your $search_text, use qr// instead of a plain string, like this:
my $search_text = qr/ATOM/;
It's slightly more efficient, and the quoting rules for special characters are much saner.
For printing multiple columns from an array, consider using join() and a list slice, as in:
print OUT_FILE join("\t", #fields[2,6,7,8]), "\n";
Finally, if I were you, I'd reconsider my file naming scheme: the output file name a.txt matches your input file name glob *.txt, so your script will likely break if you run it twice in a row.

perl deleting a subdirectory containing a file of specified format

i have to do a program in perl. and im very new to it.the task is
there will be directory ,inside that many subdirectories will be there.each subdirectories contain further subdirectories. finally there will be files in the end of chain of subdirectories. If the file format is ".TXT" i should delete the subdirectory that is next to the main directory that contains the .TXT file.
for eg raghav\abc\ccd\1.txt then i should delete subdirectory "abc".
my code is
#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);
my $path ="d:\\raghav";
search_all_folder($path);
sub search_all_folder {
my ($folder) = #_;
if ( -d $folder ) {
chdir $folder;
opendir my $dh, $folder or die "can't open the directory: $!";
while ( defined( my $file = readdir($dh) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
search_all_folder("$folder/$file"); ## recursive call
read_files($file) if ( -f $file );
}
closedir $dh or die "can't close directory: $!";
}
}
sub read_files {
my ($filename) = #_;
if($filename= ~/.txt/)
rmdir;
}
}

Never ever implement your own directory traversal. Use File::Find instead. It's more efficient and less prone to breaking.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $search_path = "d:\\raghav";
my %text_files_found_in;
sub find_text_files {
if (m/\.txt$/i) {
## you may want to transform this first, to get a higher level directory.
$text_files_found_in{$File::Find::dir}++;
}
}
find( \&find_text_files, $search_path );
foreach my $dir ( keys %text_files_found_in ) {
print "$dir contained a text file\n";
##maybe delete it, but don't until you're sure the code's working!
}
You've got a list of files now, against which you can figure out what to delete and then delete it. rmdir won't do the trick though - that only works on empty directories. You can either collate a list of files (as this does) or you could figure out the path to delete as you go, and insert it into a hash. (So you don't get dupes).
Either way though - it's probably better to run the find first, delete second rather than trying to delete a tree you may still be traversing.
Edit: What this program does:
Imports the File::Find module.
defines the subroutine find_text_files
runs find (in the File::Find module), and tells it to run find_text_files on every file it finds in it's recursive traversal.
find_text_files is called on every file within the triee (below $search_path). When it's called:
File::Find sets: $_ to be the current filename. We match against m/\.txt$/ to see if it's a text file.
File::Find also sets two other variables: $File::Find::dir - to the directory path to this file. And $File::Find::file to the full file path. We insert $File::Find::dir into the hash text_files_found_in provided that pattern matches.
Once find is complete, we have a hash called text_files_found_in which contains keys of all the directories where a text file was found.
we can then iterate on keys %text_files_found_in to identify - and delete.*
at this point, you may want to transform each of the directories in that list, because it'll be a full path to the file, and you may only want to delete at a higher level.
* There's no delete code in this script - you'll have to sort that yourself. I'm not putting anything that might delete stuff up on the internet where people who don't full understand it might just run it blind.

Merge multiple HTML Files

I am merging multiple html files in the directory/subdirectory into single html within the same directories. I gone through some website and tried the below code:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
my $basedir = 'c:/test';
opendir(DIR, $basedir) or die $!;
my #files = readdir(DIR); # name arrays plural, hashes singular
closedir DIR;
my $outfilename = 'final.htm';
my $outfilesrc = undef;
foreach (sort #files){
$outfilesrc.= File::Slurp::slurp("$basedir/$_");
}
open(OUT, "> $basedir/$outfilename") or die ("Can't open for writing: $basedir/$outfilename : $!");
print OUT $outfilesrc;
close OUT;
exit;
But I am getting follwing error and could not merge the file.
read_file 'c:/test.' - sysopen: Permission denied at mergehtml.pl line 15
Can anyone help me! Is there any way to merge HTML files to single in Perl?

Your error most likely comes from trying to open the "current directory" c:\test\. for reading. This comes from using readdir to list the files: readdir includes all the files.
If all you want to do is concatenate the files, its rather simple if you're in linux: cat test/* > final.htm. Unfortunately, in Windows its a bit more tricky.
perl -pe"BEGIN { #ARGV = map glob, #ARGV }" "C:/test/*" > final.htm
Explanation:
We use the -p option to read and print the content of the argument file names. Those arguments are in this case a glob, and the windows command shell does not perform these globs automagically, so we have to ask perl to do it, with the built-in glob command. We do this in a BEGIN block to separate it from the rest of the code. The "rest of the code" is in this case just (basically) a while (<>) { print } block that reads and prints the contents of the files. At the end of the line we redirect all the output to the file final.htm.
Why use glob over readdir? Well, for one thing, readdir includes the directories . (current dir) and .. (parent dir), which will mess up your code, like I mentioned at the top. You would need to filter out directories. And glob does this smoothly with no problem.
If you want the longer version of this script, you can do
use strict;
use warnings;
#ARGV = map glob, #ARGV;
while (<>) {
print;
}
Note that I suspect that you only want html files to be merged. So it would perhaps be a good idea of you to change your glob from * to something like
*.htm *.html

Filter out the files "." and ".." from your #files list.

Identifying the difference between ordinary folders and "."/".." folders

I'm writing a Perl script to automatically copy PDFs from a folder.
Users are unable to access folders they don't have permission for so they don't accidentally gain access to any information they aren't supposed to.
I have a rough mock-up which works except for one bug: it keeps seeing the . and .. folders and opens them entering infinite loops.
The following conditional statement checked to see the file was a PDF, which would then pass to my copyPDF, which checks for exceptions and then copies the file; otherwise it passes and tries to open as a folder if a folder scans that content and repeats.
I've tried a number of ways to ignore the . and .. but it always leads to ignoring all other subfolders as well. Has anyone got a work around?
if ($file =~ /\.pdf$/i) {
print "$file is a pdf\n";
$fileLocation = "$directoryName/$file";
copyPDF("$fileLocation", "$file");
}
elsif ($file == '.') {
#print "argh\n";
}
else {
$openFolder = "$directory/$file";
print "*$openFolder";
openNextDirectory("$openFolder");
}

Always use use strict; use warnings;!!!
$file == '.'
produces
Argument "." isn't numeric in numeric eq (==)
because you are asking Perl to compare two numbers. You should be using
$file eq '.'
See perldoc perlop for more information about perl's operators.

This old question has some great answers that address this and similar questions:
How can I copy a directory recursively and filter filenames in Perl?

use the File::Find module
use File::Find;
use File::Copy;
my $directory = 'c:\my\pdf\directory';
find(\&pdfcopy, $directory);
sub pdfcopy() {
my $newdirectory = 'c:\some\new\dir';
return if ($File::Find::name !~ /\.pdf$/i);
copy($File::Find::name, $newdirectory) or
die "File $File::Find::name cannot be copied: !";
}