How can I copy a directory recursively and filter filenames in Perl? - perl

How do I copy a directory including sub directories excluding files or directories that match a certain regex on a Windows system?

I'd do something like this:
use File::Copy;
sub copy_recursively {
my ($from_dir, $to_dir, $regex) = #_;
opendir my($dh), $from_dir or die "Could not open dir '$from_dir': $!";
for my $entry (readdir $dh) {
next if $entry =~ /$regex/;
my $source = "$from_dir/$entry";
my $destination = "$to_dir/$entry";
if (-d $source) {
mkdir $destination or die "mkdir '$destination' failed: $!" if not -e $destination;
copy_recursively($source, $destination, $regex);
} else {
copy($source, $destination) or die "copy failed: $!";
}
}
closedir $dh;
return;
}

Another option is File::Xcopy. As the name says, it more-or-less emulates the windows xcopy command, including its filtering and recursive options.
From the documentation:
use File::Xcopy;
my $fx = new File::Xcopy;
$fx->from_dir("/from/dir");
$fx->to_dir("/to/dir");
$fx->fn_pat('(\.pl|\.txt)$'); # files with pl & txt extensions
$fx->param('s',1); # search recursively to sub dirs
$fx->param('verbose',1); # search recursively to sub dirs
$fx->param('log_file','/my/log/file.log');
my ($sr, $rr) = $fx->get_stat;
$fx->xcopy; # or
$fx->execute('copy');
# the same with short name
$fx->xcp("from_dir", "to_dir", "file_name_pattern");

If you happen to be on a Unix-like OS and have access to rsync (1), you should use that (for example through system()).
Perl's File::Copy is a bit broken (it doesn't copy permissions on Unix systems, for example), so if you don't want to use your system tools, look at CPAN. Maybe File::Copy::Recursive could be of use, but I don't see any exclude options. I hope somebody else has a better idea.

I don't know how to do an exclusion with a copy, but you could work something up along the lines of:
ls -R1 | grep -v <regex to exclude> | awk '{printf("cp %s /destination/path",$1)}' | /bin/sh

A classic answer would use 'cpio -p':
(cd $SOURCE_DIR; find . -type f -print) |
perl -ne 'print unless m/<regex-goes-here>/' |
cpio -pd $TARGET_DIR
The 'cpio' command deals with the actual copying, including permission preservation. The trick of 'cd $SOURCE_DIR; find . ...' deals with removing the leading part of the source path from the names. The only problem with that invocation of 'find' is that it won't follow symlinks; you need to add '-follow' if that's what you want.

Related

File::Find in Perl - Looking for files only

I have a script like this to list every FILES inside my root path
use strict;
use File::Find qw(find);
my $path = "<my root path>";
find(\&Search, $path);
sub Search{
my $filename = $File::Find::name;
if(-f $filename){
print $filename."\n";
}
}
My point is to try to list all the FILES. However, it also listed the symlink inside my $root. I modify my Search function like this and it worked:
sub Search{
my $filename = $File::Find::name;
#Check if $filename is not symlink first
if(!-l $filename){
if(-f $filename){
print $filename."\n";
}
}
}
But it seem awkward right ? Why do we need two if condition just to verify $filename is the real file and not a symlink !!!
Is there anyone can suggest a better, more decent solution for this ?
Thank you and best regards.
Alex
-f is testing for file, and that includes symlinks. So yes, you do have to test both.
One slightly useful thing, is that you can probably just do:
if ( -f and not -l ) {
because File::Find sets $_ to the current file, and the file tests default to using that too. (won't work if you turn on no_chdir though).
You may also want to consider File::Find::Rule as an alternative to File::Find.
stat and lstat are identical except when it comes to symlinks. The former collects information about the linked file, whereas the latter collects information about the link itself.
The -X EXPR uses stat. lstat is needed here.
sub Search {
my $filename = $File::Find::name;
if (!lstat($filename)) {
warn("Can't stat $filename: $!\n");
return;
}
say $filename if -f _;
}
Bonus: Error checking becomes much simpler when you pre-call stat or lstat.

find command problems in perl script

I am writing a script that will create a new tar file containing only those files that were created after the previous tar.gz file was created.
my $path_to_logs = "/home/myscripts/";
my $FNAME= `ls -t *.tar.gz | head -n1`;
my $FILENAME = $path_to_logs.$FNAME;
chomp ($FILENAME);
if (-e $FILENAME){
my $changed= `find . -name '*.log' -newer $FILENAME`;
chomp $changed;
$command = "tar -cvzT ". $changed." -f deleteme-$(date +%Y-%m-%d-%H-%M-%S).tar.gz";
chomp $command;
print $command;
}
However, the outout for $command shows that each of the find results are on a new line, so I dont get on concatenated command for tar. Any idea why?
Thanks.
How about this to solve your immediate problem:
my $find_cmd = "find . -name '*.log' -newer $filename";
open my $in, '-|', $find_cmd or die "Couldn't run command. $!\n";
while(<$in>) {
chomp;
print "Do something with file: $_\n";
}
If you need the files in a single line you can create a variable and concatenate them or whatever, I just wanted to show you a better way to call a system command (it would be even better if you could call the command directly without the shell expansion but you are relying on the shell expansion there).
In the long run you might want to learn how to use perl's own find/wanted routine and how to do dir globbing instead of having to rely so much on the system.
Just transform the output from find into a single line:
my $changed= `find . -name '*.log' -newer $FILENAME`;
chomp $changed;
$changed =~ s/\n/ /g;
$command = "tar -cvzT -f deleteme-$(date +%Y-%m-%d-%H-%M-%S).tar.gz " . $changed;
Btw, in general, it's often better to reduce one's dependency on OS specific features. You can duplicate all of the commands that you're shelling to the OS in perl using not too much effort:
use strict;
use warnings;
use autodie;
use File::Find::Rule;
use File::stat;
use Time::Piece;
my $path_to_logs = "/home/myscripts/";
my ($FILENAME) = sort {
stat($a)->mtime <=> stat($b)->mtime
} glob('/home/myscripts/*.tar.gz');
if (-e $FILENAME){
my $modified = stat($FILENAME)->mtime;
my #files = File::Find::Rule->file()
->name('*.log')
->modified(">$modified")
->in('.');
my $datenow = localtime->strftime('%Y-%m-%d-%H-%M-%S');
my $command = "tar -cvzT -f deleteme-${datenow}.tar.gz ". join(' ', #files);
You could even use Archive::Tar instead of /bin/tar, but that would potentially have a performance hit.
However, regardless, these simple changes make your script much more portable, and didn't require that much additional code.
It doesn't make much sense to do this in Perl anyway. Regardless of the wrapper language, it would be simpler and more robust to pipe the find output straight to tar, which knows how to handle it.
Anyway, your use of tar's -T option is wrong. It expects a file name containing file names, one per line, not a list of file names.
Also, your FNAME contains the newest file in the current directory, where apparently the intent is to find the newest file in /home/myscripts.
Finally, the $(date ...) interpolation will not be interpolated by Perl, but trivially works if you convert this (back?) to a shell script.
#!/bin/sh
path_to_logs = "/home/myscripts/"
FNAME=$(cd "$path_to_logs"; ls -t *.tar.gz | head -n1)
FILENAME = "$path_to_logs/$FNAME"
if [ -e "$FILENAME" ]; then
find . -name '*.log' -newer "$FILENAME" |
tar -c -v -z -T - -f deleteme-$(date +%Y-%m-%d-%H-%M-%S).tar.gz
fi

Perl - locate the latest subdirectory on a network path and copy the entire contents

I want to locate the latest subdirectory on a network path and copy the entire contents of the latest subdirectory into another folder in the network path
We have lot of subfolders under the folder \\10.184.132.202\projectdump I need to sort the sub folders to get into latest folder and copy the entire contents into another folder on \\10.184.132.203\baseline
I am using the below mentioned script i am able to list the latest modified folder under the directory but I am unaware of copying the contents.
use File::stat;
use File::Copy qw(copy);
$dirname = '\\\\10.184.132.202\\projectdump\\Testing\\';
$destination = '\\\\10.184.132.203\\baseline\\Testing\\';
$timediff=0;
opendir DIR, "$dirname";
while (defined ($sub_dir = readdir(DIR)))
{
if($sub_dir ne "." && $sub_dir ne "..")
{
$diff = time()-stat("$dirname/$sub_dir")->mtime;
if($timediff == 0)
{
$timediff=$diff;
$newest=$sub_dir;
}
if($diff<$timediff)
{
$timediff=$diff;
$newest=$sub_dir;
}
}
}
print $newest,"\n";
open my $in, '<', $newest or die $!;
while (<$in>) {
copy *, $destination; --------> Here i want to copy the entire contents of the $newest to $destination.
}
Use File::Copy::Recursive. This is an optional module, but allows you to copy entire directory trees. Unfortunately, File::Copy::Recursive is not a standard Perl module, but you can install it via the cpan command.
If installing modules is a problem (sometimes it is), you can use the File::Find to go through the directory tree and copy files one at a time.
By the way, you can use forward slashes in Perl for Windows file names, so you don't have to double up on backslashes.
Why don't call a simple shell cmd to find the latest dir?
I think, this will be much simpler in shell...
my $newestdir=`ls -1rt $dirname|tail -n 1`;
in shell:
LATESTDIR=`ls -1rt $dirname|tail -n 1`
cp -r ${LATESTDIR}/* $destination/
Ups, I just realized that you might using Windows...
Get all dirs and their times into a hash then sort that hash reverse order to find the newest one
my ($newest) = sort {$hash{$b} cmp $hash{$a} keys %hash;
then
opendir NDIR, "$newest";
while ($dir=<NDIR>) {
next if $dir eq '.' or $dir eq '..';
copy $dir, $destination;
}

How to traverse Subversion repository to find specific file, and stop searching further down?

I have this problem: given a Subversion repository http://svn/trunk/ I want to search the whole repository to find/list all files named exp.xml (their whole URL). Once the first occurence has been found I want it to stop searching further down the URL. Just to make it clear, here are some fictitious URLs:
http://svn/trunk/pro1/sub-pro-x/exp.xml/sub-pro-x1/exp.xml
http://svn/trunk/pro2/sub-pro-y/pro-y1/exp.xml/sub-pro-y1/exp.xml
http://svn/trunk/pro3/sub-pro-z/exp.xml/sub-pro-z1/exp.xml/sub-proj/exp.xml
The result should be:
http://svn/trunk/pro1/sub-pro-x/exp.xml
http://svn/trunk/pro2/sub-pro-y/pro-y1/exp.xml
http://svn/trunk/pro3/sub-pro-z/exp.xml
Now I already have a solution, but it's not really very efficient because I use grep exp.xml after svn -R list --- has searched the whole repository (30-40 min). In case you want to know, here is the command:
svn list -R http://svn/trunk | grep /exp.xml
So my question is whether it is possible to make any significant speedup to this query? One thing I am thinking of is maybe use some language, preferably Perl, to directly traverse the http:/svn/trunk/ and process all the links, and stop traversing further down when it finds the first exp.xml.
Thanks for your time.
If you want it to be faster, I would try checking out the SVN project and then searching the files on disk. You could perform a search using "find" in the checked-out sandbox (where "." assumes you are in the top directory of your project):
find . -name 'exp.xml'
but, similar to your "grep" solution, I don't think it achieves your "stop searching further" criteria. If you want a Perl script to search for "exp.xml" but stop recursing if it finds a match, try this (takes top level directory as argument):
#!/usr/bin/env perl
use warnings;
use strict;
my #dirs = $ARGV[0];
my #files;
DIR:
while (my $dir = shift #dirs) {
opendir(my $dh, $dir) or die "Couldn't open dir $dir: $!";
my #new_dirs;
while (my $file = readdir($dh)) {
# skip special directories (".", "..", and ".svn")
next if $file =~ /^\./;
# turn file into correct relative path
$file = "$dir/$file";
if (-d $file) {
push #new_dirs, $file;
}
if ($file eq "$dir/exp.xml") {
# if we matched, next outer loop so we don't recurse further
push #files, $file;
next DIR;
}
}
# if we didn't match any files, we need to check sub-dirs
push #dirs, #new_dirs;
}
print "$_\n" for #files;
Use svn ls [URL] or svn ls -R [URL] with your script to list the SVN repository starting at [URL]. See svn ls --help for more info.

How can I scan multiple log files to find which ones have a particular IP address in them?

Recently there have been a few attackers trying malicious things on my server so I've decided to somewhat "track" them even though I know they won't get very far.
Now, I have an entire directory containing the server logs and I need a way to search through every file in the directory, and return a filename if a string is found. So I thought to myself, what better of a language to use for text & file operations than Perl? So my friend is helping me with a script to scan all files for a certain IP, and return the filenames that contain the IP so I don't have to search for the attacker through every log manually. (I have hundreds)
#!/usr/bin/perl
$dir = ".";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach $file(#files) {
open FILE, "$file" or die "Unable to open files";
while(<FILE>) {
print if /12.211.23.200/;
}
}
although it is giving me directory read errors. Any assistance is greatly appreciated.
EDIT: Code edited, still saying permission denied cannot open directory on line 10. I am just going to run the script from within the logs directory if you are questioning the directory change to "."
Mike.
Can you use grep instead?
To get all the lines with the IP, I would directly use grep, no need to show a list of files, it's a simple command:
grep 12\.211\.23\.200 *
I like to pipe it to another file and then open that file in an editor...
If you insist on wanting the filenames, it's also easy
grep -l 12\.211\.23\.200 *
grep is available on all Unix//Linux with the GNU tools, or on windows using one of the many implementations (unxutils, cygwin, ...etc.)
You have to concatenate $dirname with $filname when using files found through readdir, remember you haven't chdir'ed into the directory where those files resides.
open FH, "<", "$dirname/$filname" or die "Cannot open $filname:$!";
Incidentally, why not just use grep -r to recursively search all subdirectories under your log dir for your string?
EDIT: I see your edits, and two things. First, this line:
#files = grep(/\.*$/,readdir(DIR));
Is not effective, because you are searching for zero or more . characters at the end of the string. Since it's zero or more, it'll match everything in the directory. If you're trying to exclude files ending in ., try this:
#files = grep(!/\.$/,readdir(DIR));
Note the ! sign for negation if you're trying to exclude those files. Otherwise (if you only want those files and I'm misunderstanding your intent), leave the ! out.
In any case, if you're getting your die message on line 10, most likely you're hitting a file that has permissions such that you can't read it. Try putting the filename in the die output so you can see which file it's failing on:
open FILE, "$file" or die "Unable to open file: $file";
But as with other answers, and to reiterate: Why not use grep? The unix command, not the Perl function.
This will get the file names you are looking for in perl, and probably do it much faster than running and doing a perl regex.
#files = `find ~/ServerLogs -name "*.log" | xargs grep -l "<ip address>"`'
Although, this will require a *nix compliant system, or Cygwin on Windows.
Firstly get a list of files within your source directory:
opendir(DIR, "$dir");
#files = grep(/\.log$/,readdir(DIR));
closedir(DIR);
And then loop through those files
foreach $file(#files)
{
// file processing code
}
My first suggest would be to use grep instead. The right tool for the job, they say...
But to answer your question:
readdir just returns the filenames from the directory. You'll need to concatenate the directory name and filename together.
$path = "$dirname/$filname";
open FH, $path or die ...
Then you should ignore files that are actually directories, such as "." and "..". After getting the $path, check to see if it's a file.
if (-f $path) {
open FH, $path or die ...
while (<FH>)
BTW, I thought I would throw in a mention for File::Next. To iterate over all files in a directory (recursively):
use Path::Class; # always useful.
use File::Next;
my $files = File::Next::files( dir(qw/path to files/) ); # look in path/to/files
while( defined ( my $file = $files->() ) ){
$file = file( $file );
say "Examining $file";
say "found foo" if $file->slurp =~ /foo/;
}
File::Next is taint-safe.
~ doesn't auto-expand in Perl.
opendir my $fh, '~/' or die("Doin It Wrong"); # Doing It Wrong.
opendir my $fh, glob('~/') and die( "Thats right!" );
Also, if you must use readdir(), make sure you guard the expression thus:
while (defined(my $filename = readdir(DH))) {
...
}
If you don't do the defined() test, the loop will terminate if it finds a file called '0'.
Have you looked on CPAN for log parsers? I searched with 'log parse' and it yielded over 200 hits. Some (probably many) won't be relevant - some may be. It depends, in part, on which web server you are using.
Am I reading this right? Your line 10 that gives you the error is
open FILE, "$file" or die "Unable to open files";
And the $file you are trying to read, according to line 6,
#files = grep(/\.*$/,readdir(DIR));
is a file that ends with zero or more dot. Is this what you really wanted? This basically matches every file in the directory, including "." and "..". Maybe you don't have enough permission to open the parent directory for reading?
EDIT: if you only want to read all files (including hidden ones), you might want to use something like the following:
opendir(DIR, ".");
#files = readdir(DIR);
closedir(DIR);
foreach $file (#files) {
if ($file ne "." and $file ne "..") {
open FILE, "$file" or die "cannot open $file\n";
# do stuff with FILE
}
}
Note that this doesn't take care of sub directories.
I know I am way late to this discussion (ran across it while searching for grep related posts) but I am going to answer anyway:
It isn't specified clearly if these are web server logs (Apache, IIS, W3SVC, etc.) but the best tool for mining those for data is the LogParser tool from Microsoft. See logparser.com for more info.
LogParser will allow you to write SQL-like statements against the log files. It is very flexible and very fast.
Use perl from the command line, like a better grep
perl -wnl -e '/12.211.23.200/ and print;' *.log > output.txt
the benefit here is that you can chain logic far easier
perl -wnl -e '(/12.211.23.20[1-11]/ or /denied/i ) and print;' *.log
if you are feeling wacky you can also use more advanced command line options to feed perl one liner result into other perl one liners.
You really need to read "Minimal Perl: For UNIX and Linux People", awesome book on this very sort of thing.
First, use grep.
But if you don't want to, here are two small improvements you can make that I haven't seen mentioned yet:
1) Change:
#files = grep(/\.*$/,readdir(DIR));
to
#files = grep({ !-d "$dir/$_" } readdir(DIR));
This way you will exclude not just "." and ".." but also any other subdirectories that may exist in the server log directory (which the open downstream would otherwise choke on).
2) Change:
print if /12.211.23.200/;
to
print if /12\.211\.23\.200/;
"." is a regex wildcard meaning "any character". Changing it to "\." will reduce the number of false positives (unlikely to change your results in practice but it's more correct anyway).