Optimized way to print directory paths recursively without file comparison in perl - perl

I have a directory which contains multiple levels of sub dirs. I want to print path for each and every directory.
Currently, I am using
use File::Find;
find(
{
wanted => \&findfiles,
}, $maindirectory);
sub findfiles
{
if (-d) {
push #arrayofdirs,$File::Find::dir;
}
}
But each subdirectory contains thousands of files at each level. The above code takes lot of time to provide the result as it compares each file for directory. Is there a way to get subdirectories path without comparing files to save time or any other optimized method?
Edit: This issue got partially resolved but a new issue came up because of this solution. I have listed it here: Multiple File search in varying level of directories in perl

If you are on a UNIX/Linux platform then you can try reading output of find $maindirectory -type d command into your program (see this answer for a safe way to do that.). This command prints the names of directories in $maindirectory. It is faster because a compiled C program (find) does all the hard work. The following script should print all directory paths found.
Sample script:
use strict;
use warnings;
my $maindirectory = '.';
open my $fh, '-|', 'find', $maindirectory, '-type', 'd' or die "Can't open pipe: $!";
while( my $dir = <$fh>) {
print $dir;
}
close $fh or warn "can't close pipe: $!";
Note that there is no point in calling find through perl and then just printing its output without any processing. You can just as well run find $maindirectory -type d in shell itself.

Related

How to extend a program that works for one file to act on every file in a directory

I wrote a program to check for misspellings or unused data in a text file. Now, I want to check all files in a directory using the same process.
Here are a few lines of the script that I run for the first file:
open MYFILE, 'checking1.txt' or die $!;
#arr_file = <MYFILE>;
close (MYFILE);
open FILE_1, '>text1' or die $!;
open FILE_2, '>Output' or die $!;
open FILE_3, '>Output2' or die $!;
open FILE_4, '>text2' or die $!;
for ($i = 0; $i <= $#arr_file; $i++) {
if ( $arr_file[$i-1] =~ /\s+\S+\_name\s+ (\S+)\;/ ) {
print FILE_1 "name : $i $1\n";
}
...
I used only one file, checking1.txt, to execute the script, but now I want to do the same process for all files in the all_file_directory
Use an array to store file names and then loop over them. At the end of loop rename output files or copy them somewhere so that they do not get overwritten in next iteration.
my #files = qw(checking1.txt checking2.txt checking3.txt checking4.txt checking5.txt);
foreach my $filename (#files){
open (my $fh, "<", $filename) or die $!;
#perform operations on $filename using filehandle $fh
#rename output files
}
Now for the above to work you need to make sure the files are in the same directory. If not then:
Provide absolute path to each file in #files array
Traverse directory to find desired files
If you want to traverse the directory then see:
How do I read in the contents of a directory in Perl?
How can I recursively read out directories in Perl?
Also:
Use 3 args open
Always use strict; use warnings; in your Perl program
and give proper names to the variables. For eg:
#arr_file = <MYFILE>;
should be written as
#lines = <MYFILE>;
Your all files in same directory means put the program inside the directory then run it.
For read the file from a directory use glob
while (my $filename =<*.txt>) # change the file extension whatever you want
{
open my $fh, "<" , $filename or die "Error opening $!\n";
#do your stuff here
}
Why not useFile::Find? Makes changing files in directories very easy. Just supply the start directory.
It's not always the best choice, depends on your needs, but it's useful and easy almost every time I need to modify a lot of files all at once.
Another option is to just loop through the files, but in this case you'll have to supply the file names.
As mkHun pointed out a glob can be helpful.

How to open multiple files in Perl

Guys im really confused now. Im new to learning Perl. The book ive read sometimes do Perl codes and sometimes do Linux commands.
Is there any connection between them? (Perl codes and linux commands)
I want to open multiple files using Perl code, i know how to open a single file in Perl using:
open (MYFILE,'somefileshere');
and i know how to view multiple files in Linux using ls command.
So how to do this? can i use ls in perl? And i want to open certain files only (perl files) which dont have file extension visible (I cant use *.txt or etc. i guess)
A little help guys
Use system function to execute linux command, glob - for get list of files.
http://perldoc.perl.org/functions/system.html
http://perldoc.perl.org/functions/glob.html
Like:
my #files = glob("*.h *.m"); # matches all files with a .h or .m extension
system("touch a.txt"); # linux command "touch a.txt"
Directory handles are also quite nice, particularly for iterating over all the files in a directory. Example:
opendir(my $directory_handle, "/path/to/directory/") or die "Unable to open directory: $!";
while (my $file_name = <$directory_handle>) {
next if $file_name =~ /some_pattern/; # Skip files matching pattern
open (my $file_handle, '>', $file_name) or warn "Could not open file '$file_name': $!";
# Write something to $file_name. See <code>perldoc -f open</code>.
close $file_handle;
}
closedir $directory_handle;

Recursive directory traversal in Perl

I'm trying to write a script that prints out the file structure starting at the folder the script is located in. The script works fine without the recursive call but with that call it prints the contents of the first folder and crashes with the following message: closedir() attempted on invalid dirhandle DIR at printFiles.pl line 24. The folders are printed and the execution reaches the last line but why isn't the recursive call done? And how should I solve this instead?
#!/usr/bin/perl -w
printDir(".");
sub printDir{
opendir(DIR, $_[0]);
local(#files);
local(#dirs);
(#files) = readdir(DIR);
foreach $file (#files) {
if (-f $file) {
print $file . "\n";
}
if (-d $file && $file ne "." && $file ne "..") {
push(#dirs, $file);
}
}
foreach $dir (#dirs) {
print "\n";
print $dir . "\n";
printDir($dir);
}
closedir(DIR);
}
You should always use strict; and use warnings; at the start of your Perl program, especially before you ask for help with it. That way Perl will show up a lot of straightforward errors that you may not notice otherwise.
The invalid filehandle error is likely because DIR is a global directory handle and has been closed already by a previous execution of the subroutine. It is best to always used lexical handles for both files and directories, and to test the return code to make sure the open succeeded, like this
opendir my $dh, $_[0] or die "Failed to open $_[0]: $!";
One advantage of lexical file handles is that they are closed implicitly when they go out of scope, so there is no need for your closedir call at the end of the subroutine.
local isn't meant to be used like that. It doesn't suffice as a declaration, and you are creating a temporary copy of a global variable that everything can access. Best to use my instead, like this
my #dirs;
my #files = readdir $dh;
Also, the file names you are using from readdir have no path, and so your file tests will fail unless you either chdir to the directory being processed or append the directory path string to the file name before testing it.
Use the File::Find module. The way i usually do this is using the find2perl tool which comes with perl, which takes the same parameters as find and creates a suitable perl script using File::Find. Then i fine-tune the generated script to do what i want it to do. But it's also possible to use File::Find directly.
Why not use File::Find?
use strict; #ALWAYS!
use warnings; #ALWAYS!
use File::Find;
find(sub{print "$_\n";},".");

How can I create a path with all of its subdirectories in one shot in Perl?

If you have a path to a file (for example, /home/bob/test/foo.txt) where each subdirectory in the path may or may not exist, how can I create the file foo.txt in a way that uses "/home/bob/test/foo.txt" as the only input instead of creating every nonexistent directory in the path one by one and finally creating foo.txt itself?
You can use File::Basename and File::Path
use strict;
use File::Basename;
use File::Path qw/make_path/;
my $file = "/home/bob/test/foo.txt";
my $dir = dirname($file);
make_path($dir);
open my $fh, '>', $file or die "Ouch: $!\n"; # now go do stuff w/file
I didn't add any tests to see if the file already exists but that's pretty easy to add with Perl.
Use make_dir from File::Util
use File::Util;
my($f) = File::Util->new();
$f->make_dir('/var/tmp/tempfiles/foo/bar/');
# optionally specify a creation bitmask to be used in directory creations
$f->make_dir('/var/tmp/tempfiles/foo/bar/',0755);
I don't think there's a standard function that can do all of what you ask, directly from the filename.
But mkpath(), from the module File::Path, can almost do it given the filename's directory. From the File::Path docs:
The "mkpath" function provides a
convenient way to create directories,
even if your "mkdir" kernel call won't
create more than one level of
directory at a time.
Note that mkpath() does not report errors in a nice way: it dies instead of just returning zero, for some reason.
Given all that, you might do something like:
use File::Basename;
use File::Path;
my $fname = "/home/bob/test/foo.txt";
eval {
local $SIG{'__DIE__'}; # ignore user-defined die handlers
mkpath(dirname($fname));
};
my $fh;
if ($#) {
print STDERR "Error creating dir: $#";
} elsif (!open($fh, ">", $fname)) {
print STDERR "Error creating file: $!\n";
}

How can I scan multiple log files to find which ones have a particular IP address in them?

Recently there have been a few attackers trying malicious things on my server so I've decided to somewhat "track" them even though I know they won't get very far.
Now, I have an entire directory containing the server logs and I need a way to search through every file in the directory, and return a filename if a string is found. So I thought to myself, what better of a language to use for text & file operations than Perl? So my friend is helping me with a script to scan all files for a certain IP, and return the filenames that contain the IP so I don't have to search for the attacker through every log manually. (I have hundreds)
#!/usr/bin/perl
$dir = ".";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach $file(#files) {
open FILE, "$file" or die "Unable to open files";
while(<FILE>) {
print if /12.211.23.200/;
}
}
although it is giving me directory read errors. Any assistance is greatly appreciated.
EDIT: Code edited, still saying permission denied cannot open directory on line 10. I am just going to run the script from within the logs directory if you are questioning the directory change to "."
Mike.
Can you use grep instead?
To get all the lines with the IP, I would directly use grep, no need to show a list of files, it's a simple command:
grep 12\.211\.23\.200 *
I like to pipe it to another file and then open that file in an editor...
If you insist on wanting the filenames, it's also easy
grep -l 12\.211\.23\.200 *
grep is available on all Unix//Linux with the GNU tools, or on windows using one of the many implementations (unxutils, cygwin, ...etc.)
You have to concatenate $dirname with $filname when using files found through readdir, remember you haven't chdir'ed into the directory where those files resides.
open FH, "<", "$dirname/$filname" or die "Cannot open $filname:$!";
Incidentally, why not just use grep -r to recursively search all subdirectories under your log dir for your string?
EDIT: I see your edits, and two things. First, this line:
#files = grep(/\.*$/,readdir(DIR));
Is not effective, because you are searching for zero or more . characters at the end of the string. Since it's zero or more, it'll match everything in the directory. If you're trying to exclude files ending in ., try this:
#files = grep(!/\.$/,readdir(DIR));
Note the ! sign for negation if you're trying to exclude those files. Otherwise (if you only want those files and I'm misunderstanding your intent), leave the ! out.
In any case, if you're getting your die message on line 10, most likely you're hitting a file that has permissions such that you can't read it. Try putting the filename in the die output so you can see which file it's failing on:
open FILE, "$file" or die "Unable to open file: $file";
But as with other answers, and to reiterate: Why not use grep? The unix command, not the Perl function.
This will get the file names you are looking for in perl, and probably do it much faster than running and doing a perl regex.
#files = `find ~/ServerLogs -name "*.log" | xargs grep -l "<ip address>"`'
Although, this will require a *nix compliant system, or Cygwin on Windows.
Firstly get a list of files within your source directory:
opendir(DIR, "$dir");
#files = grep(/\.log$/,readdir(DIR));
closedir(DIR);
And then loop through those files
foreach $file(#files)
{
// file processing code
}
My first suggest would be to use grep instead. The right tool for the job, they say...
But to answer your question:
readdir just returns the filenames from the directory. You'll need to concatenate the directory name and filename together.
$path = "$dirname/$filname";
open FH, $path or die ...
Then you should ignore files that are actually directories, such as "." and "..". After getting the $path, check to see if it's a file.
if (-f $path) {
open FH, $path or die ...
while (<FH>)
BTW, I thought I would throw in a mention for File::Next. To iterate over all files in a directory (recursively):
use Path::Class; # always useful.
use File::Next;
my $files = File::Next::files( dir(qw/path to files/) ); # look in path/to/files
while( defined ( my $file = $files->() ) ){
$file = file( $file );
say "Examining $file";
say "found foo" if $file->slurp =~ /foo/;
}
File::Next is taint-safe.
~ doesn't auto-expand in Perl.
opendir my $fh, '~/' or die("Doin It Wrong"); # Doing It Wrong.
opendir my $fh, glob('~/') and die( "Thats right!" );
Also, if you must use readdir(), make sure you guard the expression thus:
while (defined(my $filename = readdir(DH))) {
...
}
If you don't do the defined() test, the loop will terminate if it finds a file called '0'.
Have you looked on CPAN for log parsers? I searched with 'log parse' and it yielded over 200 hits. Some (probably many) won't be relevant - some may be. It depends, in part, on which web server you are using.
Am I reading this right? Your line 10 that gives you the error is
open FILE, "$file" or die "Unable to open files";
And the $file you are trying to read, according to line 6,
#files = grep(/\.*$/,readdir(DIR));
is a file that ends with zero or more dot. Is this what you really wanted? This basically matches every file in the directory, including "." and "..". Maybe you don't have enough permission to open the parent directory for reading?
EDIT: if you only want to read all files (including hidden ones), you might want to use something like the following:
opendir(DIR, ".");
#files = readdir(DIR);
closedir(DIR);
foreach $file (#files) {
if ($file ne "." and $file ne "..") {
open FILE, "$file" or die "cannot open $file\n";
# do stuff with FILE
}
}
Note that this doesn't take care of sub directories.
I know I am way late to this discussion (ran across it while searching for grep related posts) but I am going to answer anyway:
It isn't specified clearly if these are web server logs (Apache, IIS, W3SVC, etc.) but the best tool for mining those for data is the LogParser tool from Microsoft. See logparser.com for more info.
LogParser will allow you to write SQL-like statements against the log files. It is very flexible and very fast.
Use perl from the command line, like a better grep
perl -wnl -e '/12.211.23.200/ and print;' *.log > output.txt
the benefit here is that you can chain logic far easier
perl -wnl -e '(/12.211.23.20[1-11]/ or /denied/i ) and print;' *.log
if you are feeling wacky you can also use more advanced command line options to feed perl one liner result into other perl one liners.
You really need to read "Minimal Perl: For UNIX and Linux People", awesome book on this very sort of thing.
First, use grep.
But if you don't want to, here are two small improvements you can make that I haven't seen mentioned yet:
1) Change:
#files = grep(/\.*$/,readdir(DIR));
to
#files = grep({ !-d "$dir/$_" } readdir(DIR));
This way you will exclude not just "." and ".." but also any other subdirectories that may exist in the server log directory (which the open downstream would otherwise choke on).
2) Change:
print if /12.211.23.200/;
to
print if /12\.211\.23\.200/;
"." is a regex wildcard meaning "any character". Changing it to "\." will reduce the number of false positives (unlikely to change your results in practice but it's more correct anyway).